perm filename NOTES.END[LSP,JRA]20 blob sn#238744 filedate 1976-09-29 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00042 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00005 00002	.SEC(The dynamic structure of LISP,dynamic structure,,P107:)
C00011 00003	.SS(%aSM%*:A Simple machine,%aSM%*,P33:)
C00020 00004	.SS(On Compilation,compiler)
C00031 00005	.SS(Compilers for subsets of LISP)
C00045 00006	.BEGIN NOFILLGROUP
C00048 00007	This requires the establishment of more conventions for our compiler.
C00055 00008	.CENT(Problems)
C00056 00009	.SS(One-pass Assemblers,assemblers,P7:)
C00068 00010	.<<FIXUPS>>
C00080 00011	.SS(A compiler for simple %3eval%*: the value stack,compiler,P32:)
C00092 00012	.SS(A compiler for simple %3eval%*: the control stack,compiler,P167:)
C00097 00013	.SS(A compiler for simple %3eval%*,compiler)
C00108 00014	Here is a partial sketch of %3compile%* operating on %3j <= λ[[xy]f[g[x]h[y]]]%*.
C00112 00015	.SS(A project: Efficient compilation)
C00116 00016	.SS(Efficiency: primitive operations,,P166:)
C00122 00017	.SS(Efficiency: calling sequences)
C00133 00018	.SS(Efficiency: predicates)
C00138 00019	.SS(A compiler for %3progs%*)
C00142 00020	.SS(Further optimizations)
C00150 00021	.SS(A project: One-pass assembler)
C00153 00022	.SS(A project: Syntax directed compilation,syntax-directed,P4:)
C00154 00023	.SS(A deep-binding compiler,deep binding)
C00155 00024	.SS(Compilation and global variables,global variables,P5:)
C00164 00025	.SS(Functional Arguments,,P3:)
C00166 00026	.SS(Macros and special forms,macros,P57:)
C00182 00027	.SS(Debugging in general)
C00187 00028	.SS(Debugging in LISP,,P168:)
C00193 00029	.SEC(Storage structures and efficiency,,P210:)
C00198 00030	.SS(Bit-tables)
C00200 00031	.SS(Vectors and arrays,,P137:)
C00209 00032	A typical implementation on an array facility in LISP would include
C00212 00033	.SS(strings and linear LISP,string processor,P27:)
C00222 00034	.CENT(Compacting GC for LISP)
C00235 00035	.SS(%3rplaca%* and %3rplacd%*,,P171:)
C00243 00036	.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
C00257 00037	.SS(Numbers,numbers)
C00263 00038	.SS(Stacks and Threading)
C00270 00039	.CENT(A non-recursive %3read%*)
C00281 00040	.CENT(A link-bending garbage collector for LISP)
C00285 00041	.SS(Dynamic allocation and LISP)
C00294 00042	.SS(Hash linking)
C00295 ENDMK
C⊗;
.SEC(The dynamic structure of LISP,dynamic structure,,P107:)
.TURN ON "←";

.SS(Introduction)
****stuff this somewhere****

We will soon see
that LISP compilers for deep binding %2can%* be made efficient
in their access of local and global variables; but  the interpreter
%2will%* have to search. One of the worst cases will be the execution of a loop,
in which accesses to the same variables occur frequently.
This, perhaps is another good reason for removing control of iteration
from the hands of the programmer.  The extent of (or even the presence of) a loop
which the user is 
controlling by tests and goto's is difficult to discover.  If a loop
is controlled by language constructs (WHILE, REPEAT,  etc.) then the 
interpreter might have some chance of improving the execution of the loop.
******

As things presently stand we have the  basic static structure of a LISP machine consisting
of %3eval%* and friends, the I/O routines, the garbage
collector, an initial symbol table which believes in the primitive 
functions and  some commonly needed utility functions. 
We have two areas of memory:#pointer space, and full word space.

Expressions (forms)  to be evaluated are read in, converted to list structure,
and evaluated by %3eval%*.  What %3eval%* is doing is traversing the 
S-expression representation of the form, interpreting the 
information found there as LISP "instructions".  This is an expensive process.
In the general case there is nothing we can do to reduce this cost, but very
frequently there is a mapping from the LISP expressions to be 
evaluated to a sequence of actual machine instructions, which when
carried out will have the same computational effect as %3eval%*, massaging
the list structure.  This mapping is called a ⊗→compiler↔←. 
So a compiler is simply a useful tool for increasing the  speed of execution.

We have said nothing about how the actual execution of the expression
takes place. The essential ingredient involved here  is the handling
of control -- the dynamics of LISP execution.
How can we implement call-by-value, parameter passing, recursion, and 
conditional expressions?
This chapter will deal with the implementation of the control structures
of LISP. 
We will describe implementations of these features as we develop
compilers for LISP.
This use of compilers has many motivations:

%21.%*  To describe the compiler we must carefully define the machine
which will execute the instructions which the compiler produces.
The code generated by the compiler will reference the control primitives of the 
machine, so we might as well build a compiler at the same time we
show the dynamic structure of LISP.

%22.%*  The design of the compiler  shows a very non-trivial application
of non-numerical
computation. At the same time we will see how simple it is to
describe such complex algorithms in LISP.

%23.%*  It will clearly show the relationship between compilation and
evaluation.  That is, the LISP function representing the compiler
will very closely parallel the structure of the interpreter, %3eval%*.
If you understand %3eval%*, then the compiler is easy.

%24.%*  Everyone should see a compiler ("%2Shut up!%*", he %3explained%*).

As in the previous chapter, we will attempt to remain as abstract and aloof
as possible without losing the necessary details. 
But a meaningful
description of compilers entails an understanding of a machine.
So before the actual construction of the compilers, we will
describe a simple machine, %aSM%*,
with a sufficient instruction set to handle the control structures of LISP.


.SS(%aSM%*:A Simple machine,%aSM%*,P33:)
This section  describes a simple machine which
has a sufficient instruction set to handle an implementation of LISP.
Before we begin, note that this machine is %2not%* necessary for our
understanding of %3eval%*. %3eval%* is self-contained. We need only describe
a machine to discuss implementation and compilation.  This,
indeed is an objection to describing meaning of programming languages
in terms of a compiler -- you must understand %2two%* things now -- the
language %2and%* a machine.

The simple machine, %aSM%*, has a slight similarity to the organization of the
PDP-10. We need very few features to adequately discuss the interesting 
facets of implementation
of our LISP subset. Certainly, if we were to undertake a real implementation, many
more instructions would be necessary. Similarly, when we discuss compilation
our %aSM%* suffices, but if we wished to perform %2efficient%* compilation
we would hopefully have a better instruction set.  The point here
is to understand basic algorithms. If that is accomplished it is quite
straightforward to examine problems of efficiency and details of implementation.

%aSM%* has a conventional addressable main memory, including n special registers,
AC1, AC2, ..., ACn. These registers are called %2accumulators%*
and are to contain pointers
to the arguments of a function at the time of invocation.
Each memory location  is assumed to be large enough to contain
two addresses; the ACs will contain pointers or addresses. 
For sake of discussion, assume the word size is 36 bits.
Assume the addressing space is then 2%818%*. 
The mapping of a %7[~~~]~~~]%* onto a %aSM%* location is easy; the %3car%*
maps to the left-half of the word; the %3cdr%*, to the right.
A memory area to contain full-words (%7#[~~~~~~]%* ) is designated. 
All p-names and numbers are stored here.

Parts of %aSM%* memory can be designated as stacks. Each stack is a contiguous
area of memory, and the current top of a stack is referenced  by a general
register, P1, ..., Pn, called a %2stack-%* or %2pushdown-%* pointer.
The stacks will be used to contain the partial results of calculations and
will contain the information necessary to implement recursive function calling.
The primary LISP stack is named P.

There are only three main classes of instructions necessary to describe
LISP implementation: instructions for constant generation, instructions for
stack manipulation, and instructions for control of flow.

The control instructions and some of the stack instructions refer to the 
program counter of %aSM%*. %aPC%* designates this counter. %3C%* in the following
means "contents of...". Similarly, %3ac%* means any accumulator; %3loc%* means
either a memory location or an %3ac%*; and %3mem%* is reserved for memory
references. Finally, here are the initial instructions:

.BEGIN TABIT1(19);TURN OFF "←";
.SELECT 3;

MOVEI ac const\C(ac) ← const

PUSH P ac\C(P) ← C(P)+1. %1Increment stack pointer%*
\C(C(P)) ← C(ac).%1 Copy contents of %3ac%*  onto top of stack.%3

POP P ac\C(ac) ← C(C(P)). %1Copy top of stack into %3ac.
\C(P) ← C(P)-1. %1Decrement  stack pointer.%3

.BEGIN INDENT 0,19;FILL;
CALL i mem\%1This instrution handles function calling. %3i%1 is the number of arguments
(assumed to be in %3AC1%1 through %3ACi%1 at time of call). 
%3mem%* represents a function name; it is the memory location of the first executable
instruction of the function represented at %3mem%*.
.END

.BEGIN INDENT 0,19;FILL;
%3RET%1\The inverse of %3CALL%1; used for returns from functions. 
We will expand on the nature of %3CALL-RET%1 in {yonss(P167)}.
.END

.BEGIN INDENT 0,19;FILL;
%3MOVE ac loc \C(AC) ← C(loc).%1
This is an instruction to load a specified %3ac%* with the contents 
of %3loc%*.
.END

%3MOVEM ac loc \C(loc) ← C(ac).%1 Copy contents of %3ac%1 into %3loc%1.

.P153:
%3SUB ac loc\C(ac) ← C(ac) - C(loc).

%3JUMP mem\C(%aPC%*) ← mem. %1Go to location %3mem%*.

%3JUMPE ac mem\%2if %3C(ac)=0 %2then%* C(%aPC%*) ← mem;

%3JUMPN ac mem\%2if %3C(ac)%c≠%*0 %2then %3C(%aPC%*) ← loc;

.END

These instructions are executed by a machine whose basic execution cycle is
something like:
.BEGIN TABIT2(15,18);GROUP;SELECT 3;TURN OFF "←";
\%al:%3\C(%aIR%3) ← C(C(%aPC%3))
\\C(%aPC%3) ← C(%aPC%3) + 1
\\ %aexecute %3C(%aIR%3)
\\%ago to l
.END
The %aIR%*, or %aInstruction register%1, is an internal register 
used to hold the instruction we are executing. Note that the %aPC%*
register is incremented %2before%* execution of the instruction. If we
incremented after the execution, and the instruction were a JUMP-type
instruction then we would get a spurious increment.

The execution phase of the machine is an evaluation or an interpretation
just like LISP does for its basic instructions. The main differences between
LISP and %aSM%* is that LISP's %3eval%* is a recursive description whereas
%aSM%* has distinguishable sequential states. A description of LISP
in a style similar to that of %aSM%* can indeed be given using a formalism
due to P. Landin, called %2⊗→SECD-machine↔←s%*.

We will begin our use of  %aSM%* in a study of call-by-value function calls.
The evaluated arguments are to be loaded into the accumulators, in a manner
consistent with the conventions we adopted for our primitives in {yonss(P26)}.
Internal λ-expressions ({yon(P170)}) are not handled yet. 
(See problem IV on {yon(P36)}).
More instructions for %aSM%* will be given as we proceed.

.SS(On Compilation,compiler)

The relationship between compilation and interpretation should be coming
clear:  the interpreter performs the evaluation; the
compiler emits instructions which when executed produce the same computational
effect as the evaluator.  
Since the code produced by the compiler is either in  machine language
or in a form closer to the machine than the source program, we can execute
the code much faster. A speed-up factor of ten is not uncommon.
Also in LISP we know that programs are stored as S-expressions.
Our compiled code will be machine language, thus freeing a large quantity
of S-expression storage. This will usually make garbage collection 
less time consuming.

Why not compile all programs? We already have seen that we can %3cons%*-up
new expressions to be evaluated while we are running. Still we can force
even those expressions through a compiler before execution⊗↓There are,
however, programs which simply %2cannot%* be compiled; the most obscene
examples involve self-modifying programs; an example is reluctantly
given on {yon(P161)}.←. The answer,
rather, is that for debugging and editing of programs it is extremely convenient
to have a structured representation of the program 
in memory. 
This structured representation  also simplifies the discussion of compilation.
Conventional compiler discussions always include description of the syntax analysis
problems. For we  cannot  compile code until we know what the syntactic structure
of the program is. The problem is that  syntax analysis is really irrelevant
for a clear understanding of the function of a compiler.
Assuming the existence of the structured representation, the compiler
is conceptually very simple.

A few soothing words to forestall mutiny:  Though it is
true that syntax-directed compilation (see {yonss(P4)}) can be used to go directly
from external program to internal machine code, our contention is that
sophisticated programming systems %2need%* the structured representation
(parse tree). 
Other processors in a modern system -- the editors, and the  debuggers,
to mention two-- can profitably fondle  the parse tree. When
we wish to run the program at top speed, we can call the compiler.
The compiler can then translate the abstract representation of the program
into machine code.
We shall say more about this view of programming later.


We shall exploit the analogy between compilers and evaluators when we write the
LISP function, ⊗→%3compile%*↔←, which will describe the compiler.
We shall do this in two ways. First, the structure of the %3compile%* function
and its subfunctions will be written to capitalize on the knowledge we
have acquired from writing evaluators. Second and more difficult, we will attempt
to abstract from the specific compiler, the essence of LISP compilers without
losing all of the details. At the most general level a compiler simply
produces code which when executed has the same effect as evaluating the original
form, but this description has lost too much detail.
This second task is more difficult because we have to separate two representations
from the specific compiler. We are representing forms as data structures; and we
are also dealing with the representation of a specific machine.
The task %2is%* worth pursuing since we wish to write different compilers
for the same machine and also  a single compiler capable of easy transportation to
other machines. 

The input to
%3compile%* is the S-expr representation of a LISP function; the output is a list
which represents a sequence of machine instructions.  Assume that we have
LISP running on %aBrand X%* machine, and we have just written the LISP function,
%3compile%*, which produces code for %aBrand X%* machine. Then:
.BEGIN TABIT1(15);GROUP

\	1.  Write the S-expression form of %3compile%*.

\	2.  Read in this translation, defining %3compile%*.

\	3.  Ask LISP to evaluate:  %3compile[COMPILE]%*.
.END

Now since %3compile%* is a function of one argument which purports to compile code
for %aBrand X%* machine, translating the S-expression representation of its argument
to machine code, the output of step 3. should be a list of instructions 
representing the translation of the function %3compile%*.  That is, step 3. compiles
the compiler.

A technique called %2⊗→bootstrapping↔←%* is closely related to the process described
above. Assume that we have LISP and its compiler running on %aBrand#X%*, and
we wish to implement LISP (quickly) on %aBrand Y%*. If we have been careful in
our encoding of the %3compile%* function then %2most%* of %3compile%* is
machine independent, that is it deals mostly with the structure of LISP and
only in a few places deals with the structure of the particular machine.  As we
will see this is not too  great an assumption to make.
Notice that this is one of our admonitions:  encode algorithms in a 
representation-independent style and include representation-dependent
routines to interface with the program. To change representations, simply
requires changing  those simpler subfunctions. Here the repesentations are
machines and the algorithm is a compiling function for LISP.

Let us call those parts of the compiler
which deal with the machine, the ⊗→code generators↔←. Now if we understand the
machine organization of brands %aX%* and %aY%* then for any instruction on %aBrand#X%*
we should be able to give a (sequence of) instructions having the equivalent
effect on %aBrand#Y%*. In this manner we can change the code generators in %3compile%*
to generate code to run on %aBrand#Y%*. We thus would have a %3compile%* function,
call it %3compile*%*, running on %aX%* and producing code for %aY%*.
Now if we take the S-expr forms of %3eval, apply, read, print, compile,...%* etc.
and pass these functions through %3compile*%*, we will get a large segment
of a LISP system which will run on %aY%*. Certain primitives will have to be
supplied to run this code on %aY%*, but a very large part of LISP can be
bootstrapped from %aX%* to %aY%*.

Obviously, before we can use %2a%* compiled version of the compiler (or
in fact, before we can use any compiled code) we must have some means of
translating the list output of the compile function into real instructions in the
machine.  So if the version of LISP which we are implementing is to have a
compiler we must allocate an area of memory which can receive the compiled code.
This area is usually called %2⊗→Binary Program Space↔←%* (BPS).  The translation program
which takes the list of output from the compiler and converts it into actual
machine instructions for ⊗→BPS↔← is called an assembler. Before discussing ⊗→assemblers↔←
we will  examine a sequence of simple compilers corresponding to the LISP
subsets evaluated by ⊗→%3tgmoaf%*↔←, ⊗→%3tgmoafr%*↔←.    Finally 
we give an ⊗→%3eval%*↔← which handles local variables.

.SS(Compilers for subsets of LISP)

We will examine  compilers for increasingly complex subsets of LISP
beginning with functions, composition and constant arguments and
ending with a more realistic compiler for a reasonable subset of
pure LISP. Though each subset is a simple extension of its predecessor,
each subset introduces a new problem to be solved by the compiling algorithm.
If the corresponding evaluator (⊗→%3tgmoaf%*↔←, ⊗→%3tgmoafr%*↔← and the most
simple ⊗→%3eval%*↔←) is well understood, then the necessary steps
to be added to the compiler are easy to make.

First we need to begin describing a list of conventions which will remain in
effect throughout this sequence of compilers.  The code which is generated
must be compatible with that which incorporates the basic LISP machine.
In particular the calling conventions for function invocation must be
maintained.  

.GROUP
.BEGIN CENTERIT;SELECT 2;
←Compiling Conventions
.END

.BEGIN INDENT 0,10,10;
%21.%* A function of %3n%* arguments expects its arguments to
be presented in %3AC1%* through %3ACn%*. Also the execution of any code which is
computing arguments to a function call must store temporary results.
Since the computation  of an argument can be  arbitrarily complex, even recursive,
we cannot store the temporary results in fixed locations. Therefore 
as we compute each argument, we store
the result on the stack, %3P%*. 
Thus the execution sequence of code for computing %3n%* arguments
should be: 
.END

.BEGIN centerit;
←compute value of argument%41%*, push onto stack, %3P%*.

←..........
←compute value of argument%4n%*, push onto stack, %3P%*.
.END
.APART

.BEGIN INDENT 10,10,10;GROUP
So after this computation the values of the arguments are stored
V%4n%*, V%4n-1%*, ..., V%41%*, from top to bottom in %3P%*.
Thus to complete the function invocation, we need only pop the arguments
into the %3AC%*'s in the obvious order. 
.END

.BEGIN INDENT 0,10,10;GROUP
.P174:
%22.%* When the function completes evaluation, it is to place that value
in %3AC1%*. Nothing can be assumed about the contents any  other %3AC%*. So if an 
%3AC%*  contains information we need then it must be saved on the 
stack %2before%*  calling the function.

.APART
.GROUP
%23.%* This brings us to the one of the most important conventions
for %2any%* compiler.
We must always be sure that the integrity of the stack is maintained. Whatever
we push onto the stack within the body of a function %6must%* be
popped off before we exit from the function body. That is, the state of the stack
must be transparent to any computations which occur within the function.
This is called %2⊗→stack synchronization↔←%*.
.APART
.GROUP


%24.%* Finally some details of the output from the compilers: the output
is to be a list of instructions, sequenced in the order which we
would expect to execute them.  Each instruction is a list: an operation
followed by as many elements as are required by that operation.
.APART

.GROUP
%25.%* And instead of refering to %3AC1, AC2, ... ACn%* we will simply
use the numbers, %31, 2, ...,n%* in the instructions.
.APART

The first compiler will handle representations of that
subset of LISP forms defined by the following BNF equations:

.BEGIN TABIT1(11);GROUP

<form>\::= <constant> | <function>[<arg>; ...; <arg>] | <function> [ ]

<arg>\::= <form>

<constant>\::= <sexpr>

<function>\::= <identifier>

.END

This LISP subset corresponds closely with ⊗→%3tgmoaf%*↔←, handling only
function names, composition, and constant arguments.
Our previous compiling conventions will keep us in good stead. 
It should now be clear how to proceed with the first expression-compiler.
In the interest of readability and generality, we will write the functions using
constructors, selectors, and recognizers and supply the necessary
bridge to our specific representation by simple sub-functions.

.BEGIN SELECT 3;TABIT2(11,18);
.GROUP

compexp <=\λ[[exp]\[isconst[exp] → list[mkconst[1;exp]];
\\ %et%* → compapply[fun[exp];complis[args[exp]];length[args[exp]]]] ]

.FILL
%3compexp%1 expects to see either a constant or a function followed by a
list of zero or more arguments. The appearance of a constant should
elicit the generation of a list containing a single instruction to load
%3AC1%* with the representation of that constant; %3mkconst[1;exp]%*
is a call on the constructor to generate that instruction.
In any other case we should generate code to evaluate each argument, 
and follow that with code for a function call.
.NOFILL
.APART
.GROUP

%3complis <=\λ[[u]\[null[u] → ( );
\\ %et%* → append[compexp[first[u]]; list[mkpush[1]]; complis[rest[u]]]] ]

.FILL
%3complis%1 gets the representation of the argument list; it must
generate a code sequence to evaluate each argument and push the result onto the
stack from %3AC1%*. Notice that we have used %3append%* with three
arguments; this could be justified by defining %3append%* as a macro.
.NOFILL

.APART
.GROUP
.FILL
%3compapply%1 has a simple taks; it takes the  list of instructions
made by %3complis%* and adds instructions at the end of the list
to get the partial computations out of the stack and into the
 %3AC%4i%1's and finally generates a function call on %3fn%*.
Here's %3compapply%*:
.NOFILL
.P82:
%3compapply <= λ[[fn;args;n]append[args;loadac[n];list[mkcall[n;fn]]]]

.APART
.P56:
.GROUP
.FILL
%3loadac%1 is the last function; it generates a sequence of instructions
to pop the top %3n%* elements of the stack into the correct %3AC%4i%1's.
The top on the stack goes to %3ACn%* and so on,  the n%8th%*  element
down the stack going into %3AC1%*.

%3loadac <= λ[[n][zerop[n] → ( );%et%* → concat[mkpop[n];loadac[n-1]] ]%1

.APART
.END
.END

Now to add the representational material:
The format for calling an %3n%*-ary function, %3fn%*, is:
.BEGIN CENTERIT;SELECT 3;
←(CALL n (E fn))
.END
The %3E%*-construct will be explained in {yonss(P7)}.
.APART
We need to compile code for constants; this is the duty of 
the constructor %3mkconst%*. A number will be seen as a numeral;
other S-expression
constants will be seen as  %3(QUOTE ...)%* by the compiler. Since the
convention %22%* on {yon(P174)} says the value of an expression is to 
be found in AC1, the
instruction which we wish to generate is:

.BEGIN TABIT1(30);SELECT 3;GROUP

\(MOVEI 1 exp)
.END

.SELECT 1;
Finally, here are the constructors, selectors, and recognizers:
.BEGIN TABIT3(5,28,48);SELECT 2;GROUP;

\Recognizer\Selector\Constructor
.SELECT 3;

isconst <= λ[[x]eq[first[x];QUOTE]%a∨%*numberp[x]]]\mkconst <= λ[[ac;x] list[MOVEI;ac; x]]
\\fun <= λ[[x]first[x]]\mkpush <= λ[[ac]list[PUSH;P;ac]]
\\args <= λ[[x]rest[x]]\mkcall <= λ[[n;fn]list[CALL;n;list[E;fn]]]]
\\\mkpop <= λ[[ac]list[POP;P;ac]]

.END


Note  that  %3compexp%* is just a
complex %9r%*-mapping whose image is a sequence of machine language instructions.

The code generated by this compiler is clearly inefficient, but that
is not the point. We wish to establish an intuitive and,
hopefully, correct compiler, then later 
worry about efficiency. Worrying about efficiency before establishing
correctness is the ultimate folly.

.CENT(Compilation of conditional expressions)
Let's look at the possibility of compiling a larger set of LISP constructs.
The innovation which occurred in %3tgmoafr%* was the appearance of conditional 
expressions. To describe conditional expressions, the above BNF equations
were augmented by:

←<form> ::= [<form> → <form> ; ... <form> → <form>]

Certainly the addition of conditional expressions will mean an extra 
piece of code in %3compexp%*
to recognize %3COND%* and a new function (analogous to %3evcond%* in %3tgmoafr%*)
to generate the code for the %3COND%*-body.
In fact, the only difference between %3evcond%* and its counterpart in 
%3compexp%*, which we shall call %3comcond%*, is that %3comcond%* generates
code which when executed will produce the same value as the value
produced by %3evcond%* operating on the given S-expression.

The effect of %3comcond%* on a conditional of the form:

.P103:
%aCOND%*←%3(COND%* %3(%eR%f(%3 p%41 %f)%3  %eR%f(%3 e%41%f)%3 ) ... (%eR%f(%3 p%4n%3 %f)%3 %eR%f(%3 e%4n %f)%3))
%1 should be clear.

First generate code for p%41%*; generate a test for truth, going to the
code for e%41%* if true, and going to the code for p%42%* if not true.
The code for e%41%* must be followed by an exit from the code for the
conditional, and we should probably generate an error condition to be executed
in the case that p%4n%* is false.

.BEGIN NOFILL;GROUP;
Pictorially, we might represent the code as:

.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
		∞1<code for p∞41∞1>∞g
		     ≤'   `≥ 
                   ≤'       `≥
                 ≤'           `≥
	        T 	        NIL
             ≤'      	          `≥
           ≤'                       `≥
    ∞1<code for e∞41∞1>∞g                  ∞1<code for p∞42∞*>∞g
       ≤' 			      ≤'`≥
     ≤' 			    ≤'    `≥ 
   ≤' 		                  ≤'        `≥
 ≤'		                ≤'            `≥
 `≥			      T  	        NIL
   `≥		            ≤'       	          `≥
     `≥ 	          ≤'                        `≥
       `≥ 	    ∞1<code for e∞42∞1>∞g                  ∞1<code for p∞43∞*>∞g
         `≥           ≤'
           `≥	    ≤'             			....
             `≥	  ≤'		                  ≤'        `≥
               `≥'		                ≤'            `≥
                 `≥			       T	        NIL
				            ≤'       	          `≥
		     ...	          ≤'                        `≥
				  ∞1<code for e∞4n∞1>∞g	                ∞1<code for error∞*>∞g
			  `≥	      ≤'				   
	                    `≥      ≤'					   
                              `≥  ≤'
				`≥ 
				  `≥
				    `≥
				     ~
				     ↓
.END

.END

This requires the establishment of more conventions for our compiler.

.GROUP
.BEGIN CENTERIT;SELECT 2;
←More Compiling Conventions
.END

.BEGIN INDENT 0,10,10;
%26.%* In particular we must be able to test for %et%* (or %ef%*). We will
define %ef%* to be the zero of the machine, and we will test for
%ef%* using the %3JUMPE%* instruction
(see {yonss(P33)})⊗↓Note that this implementation maps %ef%* to zero
and everything else to non-zero. Thus any value other than
%ef%* will be considered %et%*. This subterfuge is useful sometimes.←. 
We will test %3AC1%* since the value returned
by p%4i%* will be found there.
.END
.APART

Next, since our code is to be a %2sequence%* of instructions,
we must linearize this graph-representation of the generated code.
We will utilize a standard trick using "labels" and "go to"s.
Thus  for the compilation of %aCOND%*, {yon(P103)},
we will generate:

.BEGIN TABIT2(30,35);GROUP

\%3(\%1<code for p%41%*>%3
\\(JUMPE 1, L1)%1
\\<code for e%41%*>%3
\\(JUMP L)
\L1\%1<code for p%42%*>%3
\\(JUMPE 1, L2)
\        ...
\Ln-1\%1<code for p%4n%*>%3
\\(JUMPE 1, Ln)%1
\\<code for e%4n%*>%3
\\(JUMP L)
\Ln\(JUMP ERROR)
\L\   )
.END
%1

To generate such code we need to be able to generate the labels, %3Li%*.
The generated labels should be atomic and should be distinct. LISP has
a function, ⊗→%3gensym%*↔←, which can be used for this task. %3gensym%*
is a function of no arguments  whose value is a generated symbol or atom
which is guaranteed to be distinct from any atom generated in any of
the usual manners. In many versions of LISP, gensyms are of the form
%3Gn%* where %3n%* is a four digit number beginning at %30000%*. Thus
the first call of %3gensym[ ]%* would give %3G0000%*; succeeding calls would give
%3G0001, G0002%*, etc.
First, gensyms are not placed in the object list since
they are usually used only for their unique name. Second, if you explicitly
place them in the symbol table they are obviously distinct from each other and
will be distinct from all other atoms unless, of course, you read in such an atom.

What we must now do is write a LISP function, call it %3comcond%*,
which will gererate such  code. %3comcond%* will 
be recursive. We must thus determine what code gets generated on each
recursion and what code gets generated at the termination case.
Looking at the example we see that the block 

.BEGIN CENTERIT;
←%3(%1<code for p%4i%*> %3(JUMPE 1 Li), ... %3(JUMP L) Li)%1
.END
is the natural segment for each recursion and that:
.BEGIN CENTERIT;
←%3((JUMP ERROR) L)%1
.END
is a natural for the termination case. We notice too that within each
block we need a "local" label, %3Li%1; and that within each block,
including the terminal case, we refer to the
label %3L%* which is "global" to the whole conditional.

Without too much difficulty we can now add the recognizer for
%3COND%* to %3compexp%* and construct %3comcond%*.

.BEGIN SELECT 3; TABIT2(15,22);TURN OFF"←";GROUP;
.FILL

%1Add%* iscond[exp] → comcond[args[exp];gensym[ ]]; %1to%3 compexp 
%1where:
.NOFILL
%3

comcond <= λ[[u;glob]
\[null[u] → list[mkerror[ ];glob];
\ %et%* →append[comclause[first[u]; gensym[];glob];comcond[rest[u]; glob]
\ ]

comclause <=λ[[p;loc;glob]
\append[compexp[ante[p]];
\\list[mkjumpe[loc]];
\\compexp[conseq[p]];
\\list[mkjump[glob];loc]
\\]
\ ]

.END

.BEGIN CENTERIT;SELECT 3;GROUP;
←iscond <= λ[[x]eq[first[x]; COND]]

←args <= λ[[x] rest[x]]
←ante <= λ[[c] first[c]]
←conseq <= λ[[c] second[c]]

←mkerror <= λ[[](JUMP ERROR)]
←mkjumpe <= λ[[l]list[JUMPE;1;l]]
←mkjump <= λ[[l]list[JUMP;l]]

.END
.SELECT 1;

.BEGIN GROUP;TABIT2(30,38);
The partially exposed recursive structure of %3comcond%* would show:

←%3comcond[((%1p%41%*  e%41%3) ...(%1p%4n%*  e%4n%3));G0000]=

\%3(%1\<code for p%41%*>%3
\\(JUMPE AC1, G0001)%1
\\<code for e%41%*>%3
\\(JUMP G0000)
\ G0001\comcond[((%1p%42%* e%42%3) ...(%1p%4n%* e%4n%3))G0000])

.END

.SELECT 1;
Before attacking the major problem of our compilers, the handling of
variables, we shall describe how the list representing the output
of the compiler is turned into code which runs on the machine.


.CENT(Problems)

.BEGIN TABIT1(16);
I. Evaluate: 
%3compexp[(COND(\(EQ (QUOTE A)(QUOTE B))(QUOTE C))
\((NULL (QUOTE A))(QUOTE FOO))))]


****more, more ****

.END
.SS(One-pass Assemblers,assemblers,P7:)

In this section we discuss the mechanism for getting the LISP list, representing
instructions, turned into real instruction in Binary Program Space.
Part of the process involves the actual instructions; part of the process involves
establishing a link between the code in memory and the LISP evaluator. Thus
when calling a routine which has been compiled, the evaluator must know where
to find it. Both tasks are accomplished by the assembler. To illustrate
the points we will define %3compile%* as:

.BEGIN CENTERIT;GROUP;
←%3compile <= λ[[fn;vars;exp]append[list[mkprolog[fn]];compexp[exp]list[mkepilog[]]]%*; where:

←%3mkprolog <= λ[[f]list[LAP;f;SUBR]]%1
←%3mkepilog <= λ[[](RET)]%1.
.END

The effect of this %3mkprolog%* is to generate 
a piece of information that the assembler
will use to notify LISP that %3j%* is a machine-language version of an %3EXPR%*.
Since we are now generating  code for a function definition, we must also add 
an instruction to allow us to return to whomever called the function; %3RET%*,
generated by %3mkepilog%*,
accomplishes this. We will discuss the actual %3CALL-RET%* mechanisms in
{yonss(P167)}.

Now for an example. Consider the output from %3compile%* for the function:
.GROUP

%3←j [] <= f[g[A];h[B]]%* 

or more precisely, for the evaluation of

←%3compile[J;();(F(G(QUOTE A))(H(QUOTE B)))]%*.
.APART
.GROUP
.P169:

It would be a list of the form:
.BEGIN TABIT2(10,45);SELECT 3;
\((LAP J SUBR)\%1; says %3j%* is a function%3
\(MOVEI 1(QUOTE A))\%1; make argument for call on %3g
\(PUSH P 1)
\(POP P 1)
\(CALL 1 (E G))\%1; call the function%3
\(PUSH P 1)\%1; save the value%3
\(MOVEI 1(QUOTE B))
\(PUSH P 1)
\(POP P 1)
\(CALL 1 (E H))
\(PUSH P 1)
\(POP P 2)
\(POP P 1)
\(CALL 2(E F))
\(RET)
\)

.END
.APART
%1

The machine representation of these instructions are encodings of
specific fields of specific machine locations with specific numbers.
For example, the operation %3PUSH%* is represented as a certain number,
called its %2⊗→operation code↔←%* or %2op code%*, and which will
occupy a certain area of a machine word so that the CPU can interpret
it as an instruction to push something onto a stack.  Other fields in the
instruction are to be interpreted as references to stacks, to memory locations,
to accumulators, constants or external references to other routines.
The purpose of an assembler is to translate these mnemonic instructions
into machine instructions.

Essentially all that the assembler need do is search ⊗→symbol tables↔←
for the opcodes, for subroutine names, for accumulator and stack names,
and store the resulting values in the appropriate machine locations.

Things are slightly more complex: we must also %6add%* information to
the tables as we proceed.  For example, as we assemble the code for a new
routine we must add its name and starting address to the current symbol
table.  The phrase, %3 (LAP %1name%3 SUBR)%* does this.

We must exercise a bit of care in handling  %3QUOTE%*d expressions.
Assembling a construct like %3(MOVEI 1 (QUOTE (A B C)))%* should
have the effect of constructing the list %3(A B C)%* in free space
and placing an instruction in memory to load the address of this list into %3AC1%*.
What we must notice is that this list %3(A B C)%* is subject to
garbage collection and, if left unprotected, could be destroyed.
There are a couple of solutions. Perhaps the garbage collector could
look through compiled code for any references to free- or full-word- space;
or we could make a list of all of these constants and let the garbage
collector mark the list.
Looking through compiled code is expensive; keeping a %3QUOTE%*-list
is a reasonable compromise. It is a compromise since that strategy
might retain unnecessary structures in case functions were redefined or
recompiled.

Much more problematic is the handling of labels and references to labels.
This case arose in the compiler for ⊗→%3tgmoafr%*↔← when we introduced
conditional expressions. Recall that the code for a ⊗→conditional expression↔←
of the form:

.GROUP
%3←(COND (%1p%41%* e%41%3) ... (%1p%4n%* e%4n%3)) %1 was:

.BEGIN TABIT1(35);

\    <code for p%41%*>
\    %3(JUMPE AC1 L1)%1
\    <code for e%41%*>
\    %3(JUMP L)
\L1  %1<code for p%42%*>
\    %3(JUMPE AC1 L2)%1

\          ....
\    <code for e%4n%*>

\%3L        ....

.END
.APART
%1

The symbols, %3 L, L1,%* and %3L2%* in this example are labels.  Notice
that if we were to consider writing the assembler in LISP,
we could distinguish occurrences of labels from instructions using
the predicate, %3atom%*.

It is time to start thinking about writing such an assembler.
One of the arguments to the function should be the list representation
of the program.  One of its arguments should also describe where (in ⊗→BPS↔←)
we wish the assembled code to be located. We should also have access
to an initial symbol table, describing the opcodes and pre-defined
symbol names. Let's call the function, ⊗→%3assemble%*↔←.
%3assemble%* then can go %3rest%*-ing down the program list, looking up
definitions and manufacturing the numerical equivalent of each 
instruction, then depositing that number in the appropriate machine
location.  If %3assemble%* comes across a label definition, it should
add information to a symbol table, reflecting that the label has
been seen and that it is associated with a specific location in memory.
Then future references to that label can be translated to references
to the associated machine location.  The only problem is that references
to labels may occur %6before%* we have come across the label definition 
in the program.  Such references are called %2⊗→forward reference↔←s%*.

.GROUP
For example, consider the following nonsense program:

.BEGIN TABIT1(35);SELECT 3;

\(  (LAP FOO SUBR)
\ X (JUMP X)
\    (JUMP Y)
\ Y (JUMP X) 
\     (RET))

.END
.APART
The reference to %3Y%* is a forward reference; neither of the references to %3X%*
is forward since %3X%* is defined before being referenced.


There are two solutions to the ⊗→forward reference↔← problem:

.BEGIN INDENT 0,5;
%21.%*  Make two passes through the input program.  The first pass
decides where the labels will be assigned in memory.  That is, this
pass builds a symbol table of labels and locations.  Then a second pass
uses this symbol table to actually assemble the code into the machine.
This works, but is not particularly elegant. It is expensive to read through
the input twice, particularly when we can do better.

%22.%*  The other solution is to make one clever pass through the input.
Whenever we come across a forward reference we add information to the symbol table
telling that a forward reference has occurred at this location.  We assemble
as much of the instruction as we can.  When a label definition %6does%*
occur we check the table to see if there are any forward references pending
on that label.  It there are such we  %6⊗→fix-up↔←%* those instructions to
reference the location now assigned to the label.

.END
A speed-up by a factor ranging from two to five is not uncommon for a good
one pass assembler.
There are some restrictions which are imposed on one-pass assemblers, but
particularly for assembling compiled code, one-pass assemblers are usually sufficient.


.<<FIXUPS>>
There are at least two ways to handle fixups and forward references.  If
the fixups are of a particularly simple nature, say only requiring fixups
to the address-part of a word, then we may link those pending forward references
together, chaining them on their right halves.
.BEGIN GROUP
Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

		        ⊂αααααπααααα⊃
		        ~     ~  ≤' ~←α⊃
		        εαααααβαααααλ  ~
		        ~     ~     ~  ~
		        εαααααβαααααλ  ↑
		        ~     ~     ~  ~
		        εαααααβαααααλ  ~
			~     ~  #ααβαα$
			~     ~     ~←α⊃
		        εαααααβαααααλ  ~
                        ~     ~     ~  ↑
		        εαααααβαααααλ  ~
		        ~     ~     ~  ~
		        εαααααβαααααλ  ↑
			~     ~  #ααβαα$
			~     ~     ~←α⊃
		        εαααααβαααααλ  ~
                        ~     ~     ~  ↑
		        εαααααβαααααλ  ~
                        ~     ~     ~  ↑
		        εαααααβαααααλ  ~
		        ~     ~     ~  ~
		        εαααααβαααααλ  ↑
pointer from  αααααα→	~     ~  #ααβαα$
symbol table            %ααααα∀ααααα$  
.END
.begin centerit select 2;
←A Simple Fixup Scheme
.END
.END
Thus each time a forward reference is seen it is added to the linked list;
when the label is finally defined and given an address in memory then the
address replaces each reference link. No extra storage is used since the
linked-list is stored in the partially assembled code.

Another solution, which is potentially more general (that is, it could
handle left-half, right-half or partial-word fixups) is to store the information
about each fixup in the symbol table under each label. 
.BEGIN GROUP
Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
        from symbol table entry
		    ~	⊂αααααπααα⊃   ⊂αααααπααα⊃   ⊂αααααπααα⊃
		    %αα→~  #  ~ #αβαα→~  #  ~ #αβαα→~  #  ~  . . .
        ⊂ααααααααααα⊃   %ααβαα∀ααα$   %ααβαα∀ααα$   %ααβαα∀ααα$
        ~           ~      ↓             ↓             ↓
        εαααααααααααλ      ~             ~             ~
        ~           ~←ααααα$             ~             ~
        εαααααααααααλ                    ↓             ↓
        ~           ~                    ~             ~
        εαααααααααααλ   		 ~ 	       ~
        ~           ~←ααααααα←ααααααα←ααα$	       ~
        εαααααααααααλ   			       ↓
        ~           ~←ααααααα←ααααααα←ααααααααα←ααααααα$
        εαααααααααααλ   
        ~           ~   
        ε    . . .  λ
.END
.BEGIN CENTERIT; SELECT 2;
←Another Fixup Scheme
.END
.END
In this scheme we could store additional information with each link in the
list. That information could tell the fixup routine how to modify the
designated location.

Both methods are useful. Both have been used extensively in assemblers and
compilers.

To write the function, %3assemble%*, we will need two functions:
.BEGIN INDENT 0,10;

%21.%*  ⊗→%3deposit%*↔←%3[x;y]%*: %3x%* represents a machine address; %3y%* is a list,
representing the instruction to be deposited. %3y%*  could be a list of
 elements: %3(%*opcode, accumulator number, memory address%3)%*
The value of %3deposit%* is the value of %3y%*.

%22.%*  ⊗→%3examine%*↔←%3[x]%*: %3x%* represents a machine address.  The value
of %3examine%* is the contents of location %3x%* in the form of a list as
specified above.
.END
We can now use our fixup mechanism, %3examine%*, %3deposit%* and %3putprop%* and
%3remprop%* of
{yon(P52)} to  write the parts of the assembler which worry about forward
references and labels.
We will apply the second fixup scheme. If the label has been assigned a location
the the property list of the label will contain the indicator %3SYM%* and an
associated value representing the assigned location.
If the label has %2not%* been previously defined but has been referenced then
the atom will have an indicator %3UNDEF%*; the value-part will be a list
of all those locations which reference that label. Since we will only
be doing simple fixups, this will be sufficient. The contents on any location
referenced from such a fixup list will be a partially assembled word⊗↓instruction
or data← with the memory address portion set to 0. 
When the label  finally is defined we must perform the fixups, 
delete the %3UNDEF%* pair, and add a  %3SYM%* pair.
There are two main functions. 
%3defloc%* is called when a label has been defined; if there are no pending forward
references then the %3SYM%* pair is simply added. Otherwise the fixup mechanism is
exercised. %3gval%* is called when a label is referenced. If it is already defined
then simply return the %3SYM%* value; otherwise add a forward reference to the
waiting list. The functions follow; in both, the value of %3loc%* is  to reference
the current machine location receiving assembled instructions.

.BEGIN SELECT 3;GROUP;TABIT2(15,22);TURN OFF "←";

defloc <= λ[[lab;loc]prog[[z]
\\[z ← get[lab;UNDEF] → go[fixup]
\a\return[putprop[lab;loc;SYM]]
\fixup\deposit[car[z];fixit[examine[car[z]];loc]]
\\z ← cdr[z];
\\[null[z] → remprop[lab;UNDEF];go[a]]
\\go[fixup] ]]
.END

.BEGIN SELECT 3;GROUP;TABIT1(15);

gval <= λ[[lab]\[get[lab;SYM];
\ %et%* → putprop[lab;cons[loc;get[lab;UNDEF]];UNDEF]0]]
.END

%3fixit <= λ[[l;add]mkinstr[op[add];ac[add];loc;ind[add]]]%*

Notes: %3gval%* is full of pitfalls.
.BEGIN INDENT 10,10;GROUP;

%21%*.  %3loc%* is a global variable.

%22%*.  There is no e%41%*; we assume that if p%41%* evaluates to something not
equal to %3NIL%*, then that value is the value of the conditional expression.

%23%*.  Otherwise  the %3putprop%* is executed and %30%* is returned.
.END

.BEGIN TURN ON "#";

%3assemble%* also needs to recognize that there are different instruction formats.
That is, some instructions use an opcode and a memory reference: %3(JUMP#L)%*;
some use an opcode, accumulator, and an address: %3(PUSH#P#1)%*; and some
vary: %3(MOVEI#1#(QUOTE A))%* and %3(MOVE#1#-1#P)%*. %3assemble%* also
has to have an initial symbol table of opcodes, accumulator numbers, and
stack numbers.
.END

.BEGIN TABIT2(35,45);GROUP;
Here is a sample op-code table with their machine equivalents:

%2\symbol\value%3
\MOVE\200
\MOVEI\201
\SUB\274
\JUMP\254
\JUMPE\322
\JUMPN\326
\PUSH\261
\POP\262
\POPJ\263
\RET\263
\CALL\034
\P\14
.END

.GROUP
So for example:

←%3assemble[((LAP FOO SUBR) X (JUMP X) (JUMP Y) Y (JUMP X));104]%* should

have the final effect:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 52,57;

  ⊂ααπαααα⊃   ⊂ααααααπαααα⊃ ⊂αααπααα⊃  ⊂αααααααπααα⊃ ⊂αααααπααα⊃
  ~∃ ~  #αβαα→~ SUBR ~  #αβ→~ # ~ #αβα→~ PNAME ~ #αβ→~  #  ~     ...
  %αα∀αααα$   %αααααα∀αααα$ %αβα∀ααα$  %ααααααα∀ααα$ %ααβαα∀ααα$
 			      ~				~
			      ~			      . . .
			      ↓
			      ~
			      ~             op     ac  ind add
			      %ααα→    104  254    0   0   104
				       105  254    0   0   106
				       106  254    0   0   104

.END
.APART
.SS(A compiler for simple %3eval%*: the value stack,compiler,P32:)
The major failing of the previous %3compexp%* 
({yonss(P7)}) is its total lack of
interest  in variables. the handling of variables is  
a  non-trivial problem  which every
compiler-writer must  face.   
The second shortsighted-ness of  %3compexp%* is its inability to 
compile code for function definitions. We will alleviate both difficulties in this
section. 

First, from {yon(P169)}, we know what %3compexp%* will do with:

←%3f[g[A];h[B]]%*

.BEGIN GROUP
The following is a  slightly optimized version of the code:
.TABIT2(10,45);SELECT 3;

\(MOVE 1 (QUOTE A))\%1; get %3A%* into %31.
\(CALL 1 (E G))\%1; call the function named %3g
\(PUSH P 1)\%1; save the value%3
\(MOVE 1 (QUOTE B))\%1; get %3B%1 into %31
\(CALL 1(E H))\%1; call %3h
\(PUSH P 1)
\(POP P 2)\%1; restore the arguments in%3
\(POP P 1)\%1;   preparation for%3
\(CALL 2(E F))\%1;   calling %3f
.END
No suprises yet. What would we expect to see for a complied version of:

←%3f[g[x];h[y]]%*?

We %2should%* expect to see the same code except that we would have 
instructions to get %3x%* and %3y%* into %31%* at the appropriate time. 
So the first problem is 
how to find the values of variables. Let's make things slightly
more difficult and add 
another problem. Assume we  are really interested in compiling:

←%3j <= λ[[x;y]f[g[x];h[y]]%*

The added problem makes our question easier. Consider a call on %3j%*: %3j[A;B]%*
in fact. We know that the execution of the call occurs after the values %3A%* and 
%3B%* have been set up in %3AC1%* and %3AC2%*. Thus when we enter %3j%* the
%3AC%*'s have been set up. Thus at %2that%* time we do indeed know what the
values of %3x%* and %3y%* are supposed to be. We could invoke the bind-unbind
mechanism of %3eval%* and store these values in the %3VALUE%*-cells. However,
since these are assumed to be %2local%* variables⊗↓Thus no one within the bodies of
either %3g%* or %3h%* used %3x%* or %*y%* free. We will worry about
compilation of global references later.←, we can use a simpler mechanism.
We will save these values in the top of the stack, %3P%*. 
Thus %3P%* is also called the %2⊗→value stack↔←%* since it contains the
values of partial computations, and now also contains the values of the
local λ-variables⊗↓Yes, this %2is%* a value stack similar to that
described in  deep-binding ({yon(P148)}); however we do not need the name stack
here. This is because the compiler will %2know%* where on the stack values of
local variables can be found. It will put them there so it %2should%* know.
This lack of a name stack is a mixed blessing; we don't need the space since we
don't have to search the name stack; however we also have lost the names.
The names are useful to have if we are debugging code.←.
 We will save the values on the stack by 
prefixing the %3compexp%*-code with the usual %3PUSH%* operations:
.BEGIN TABIT1(34);SELECT 3;GROUP
\(PUSH P 1)
\(PUSH P 2)
.END
.P163:
Thus, after execution of this pair of instructions, the value of %3y%* is on the
top of the stack, and the value of %3x%* is the next element down⊗↓An observant
reader would have seen that the  %3PUSH%*  for %3x%* was unnecessary. 
Since we have assumed that %3x%* and %3y%* are strictly local, and since no one
else needs the value of %3x%* except for %3g[x]%*, we could have simply computed
%3g[x]%* directly. The cavalier reader would have assumed that we could have left
%3B%* sitting in %3AC2%* as we calculated %3g[x]%*; we cannot do that, as %3g[x]%*
might very well use %3AC2%*. Thus we must %3PUSH%* %3y%*. Be on the lookout
for obvious inefficiencies; however we are trying to be %2correct%* now and
%2efficient%* later.←.

Now that we have saved the values, we need instructions to fetch them back into
%3AC1%* when the time comes.
We will use an instance of the %3MOVE%* instruction ({yonss(P33)}). In this case
our memory reference will be into the stack %3P%*. Indeed it will be relative to
the top of the stack. That kind of addressing is described to our machine
with a memory field of the form "%3n#P%*", where %3n%* designates the offest into 
%3P%* and references the %3n%8th%1 element, counting from zero. Thus in our
current example "%30#P%*" refers to the value for %3y%* and "%3-1#P%*" refers to
%3x%* at the time %3j%* is entered.
Be sure to realize that our addressing is relative; though %3"0#P"%* refers
to %3y%* at entry time, %30#P%* will %2not%* refer to %3y%* when we have pushed
something onto stack.
Be sure to realize that we %2cannot%* change our relative addressing to hard machine 
locations in the assembler. The addressing must always be relative. We will be 
compiling code for recursive functions. Each recursive call must get a fresh
segment of the value stack in which to store its results. A similar problem appears
when we examine the %3CALL-RET%* mechanism in {yonss(P167)}.

Finally we cannot leave the code for %3j%* as it stands. If the prolog pushes
two entries onto the stack then we had better construct an epilog to remove these
two interlopers, otherwise the stack will not be 
in the pristine state expected by the calling program. As we leave %3j%* we subtract
%32%* from the pointer %3P%*. This operation is called %2⊗→stack synchronization↔←%*.
Finally we exit via %3RET%*. 
Before going into the details of %3CALL-RET%*,
we will display  the code and its execution.
One further embellishment is needed first: since we are defining a function an
turning it into compiled code, we must preface the code sequence with information to
our assembler to designate %3j%* as a %3SUBR%*.

.P165:
.BEGIN TABIT2(10,45);SELECT 3;

\(
\(LAP J SUBR)\%1; says %3j%* is a function%3
\(PUSH P AC1)\%1; save the input args%3
\(PUSH P AC2)
\(MOVE AC1 -1 P)\%1; get %3x
\(CALL 1 (E G))\%1; call the function named %3g
\(PUSH P AC1)\%1; save the value%3
\(MOVE AC1 -1 P)\%1; get %3y
\(CALL 1(E H))\%1; call %3h
\(PUSH P AC1)
\(POP P AC2)\%1; restore the arguments in%3
\(POP P AC1)\%1;   preparation for%3
\(CALL 2(E F))\%1;   calling %3f
\(SUB P(C 2))\%1;sync the stack; remove the two saved args.%3
\(RET)\%1; exit.%3 AC1%* still has the value from %3f.
\   )
.END
As you read the code and as you study its execution you should remember that
the addressing in the code is relative to the top of the stack: %3(MOVE#AC1#-1#P)%*
gets us %3x%* in one instance and gets us %3y%* in another, because the top 
of the stack changes.
Here is a picture of the execution of the code:

.BEGIN TURN ON "\";NOFILL;TABS 5,10,25,30,45,50,65,70;SELECT 3;GROUP;

AC1: x ; AC2: y

\|\| (PUSH P AC1)\\ (PUSH P AC2)\|  y\|(MOVE AC1 -1 P)\| y | (CALL 1 (E G))
\|\|    =>\|  x\|     =>\|   x\|        =>\| x |  =>


AC1: g[x] ; AC2: ?\\\AC1: y ; AC2: ?
\\ (PUSH P AC1)\|g[x]\|(MOVE AC1 -1 P)\|g[x]\| (CALL 1(E H))
\|  y\|      =>\|  y\|      =>\|  y\|   =>
\|  x\|\|  x\|\|  x\|


AC1: h[y] ; AC2: ?\\\AC2: h[y]\AC1: g[x] ; AC2: h[y]
\|g[x]\| (PUSH P AC1)\|h[y]\|   (POP P AC2)\|g[x]\|  (POP P AC1)\| y |(CALL 2 (E F))
\|  y\|       =>\|g[x]\|      =>\|  y\|      =>\| x |   =>
\|  x\|\|  y\|\|  x\|
\    \ \|  x\|


AC1: f[g[x];h[y]]
\|  y\|(SUB P (C 2))\\\|\| (RET) %1return to caller.%*
\|  x\|           =>    \=>


.END


Clearly we want a compiler to produce equivalent (albeit inefficient) code.
Once we understand how to do this it is relatively easy to improve on the efficiency.
.SS(A compiler for simple %3eval%*: the control stack,compiler,P167:)

Before giving a version of the compiler, we should justify our faith in the
%3CALL-RET%* mechanisms. Since we expect to handle recursive calling sequences,
the %3CALL-RET%* pair had better take cognizance of this. However there is a
more fundamental requirement of this pair: They must make sure that on completion of
a %3CALL%*, the %3RET%* returns to the instruction which directly follows the
%3CALL%*.
This requirement can be accomplished by a weaker call, say %3CALL%9'%1,
which stores the current value of the %aPC%* in a known location. Then the
return, %3RET%9'%1, need only pick up that saved value and stuff it into the
%aPC%*. The hardware of the %aSM%* would support such a pair. Recall the
basic machine cycle in {yonss(P33)}; the %aPC%* was incremented %2before%* the
execution of the instruction. Thus when we are about to execute a %3CALL%9'%1
the %aPC%* is already looking at the next instruction; all we need to do
is save the current %aPC%*.

So let's assume that %3CALL%9'%3 F%1 stores the %aPC%* in %3F%* and begins execution
in location %3F+1%1.

.BEGIN TABIT1(19);TURN OFF "←";
%3CALL%9'%* F\%3C(F) ← %3C(%APC%3)%1. Save the %aPC%* in %3F%*.
\%3C(%aPC%*) ← F + 1.%1 Jump to location represented by %3F + 1%*.

%3RET%9'%* F \%3C(%APC%*) ← C(F). 
.END

This pair is indeed how several languages perform their calling sequences.
It's fast and efficient; however it is not sufficient for recursive control.
If we always store in a %2fixed%* location, only the %2last%* store would be
available and previous values set by prior recursions would have been lost.
What we need is a %2⊗→control stack↔←%* counterpart of the value stack.
What the %3CALL%* will do is %2push%* the current contents of the %aPC%* onto the
control stack; and %3RET%* will pop off the top element and put it into the %aPC%* 
register⊗↓What 
will be found on the control stack at any point is a time-sequence of
those procedures which have been entered but have not yet been completed.
Such information is exceptionally useful in debugging programs.←.

However, it our good fortune that we don't need a separate control stack for
LISP; we can slip the control entrees into the value stack between the
value information⊗↓This is another good reason for stack synchronization. If
we pick up a spurious return address very mysterious results will occur.←.

Thus %3CALL%* is something  like⊗↓But not identical to, since LISP typically
offers some debugging and tracing features which can intercept function calls.
See {yonss(P168)}.←:

.BEGIN TABIT1(19);TURN OFF"←";
.SELECT 3; GROUP;

CALL FN \C(P) ← C(P) + 1
\C(C(P)) ← %aPC%*
\C(%aPC%*) ← FN, %1and:%*


RET\%3C%*(%aPC%*) ← %3C%*(%3C%*(P)). %1Copy top of stack into %aPC%3.
\%3C%*(P) ← %3C%*(P)-1. %1Decrement pointer.%*

%3RET%1 is also called  %3POPJ P%1.
.END


.SS(A compiler for simple %3eval%*,compiler)
.BEGIN TURN ON "#";
Now that we know what the runtime code for local variable  references %2could%* be,
the crucial point now is how to get a compiler to generate such code.
That is, is how to generate code which `knows'
where on  the stack we can find the value  of a local variable when
we execute that code.  What  we shall do is simulate the behavior  of
the runtime  stack while  we are  compiling the  code.   The compiler
cannot  know what the %2values%* of the  variables will be at runtime but
we claim that it should know %3where%* to find the values.  We will carry
this information through the  compiler in a manner reminiscent of the
`dotted-pair' symbol table of the first version of %3eval%* seen in {yonss(P6)}.  Instead  of
posting the current values  in the stack, the compiler  will post information about  the
positions  of the variables relative  to the top of  the stack at the
time we enter  the function.   The variable pair list, %3vpl%*, 
contains this information.  That is if we are
to compile  a function with λ-variables, %3[x;y;z]%* then %3vpl%* will begin
with:

←%3((X . 1), (Y . 2), (Z . 3) ...%*

When we set up %3vpl%* 
we also set the %2⊗→offset↔←%*,  called %3off%*, to minus the number  of args added
to %3vpl%*, in this  case -3.  Now if we come across a reference say to
%3Y%* while compiling code, we  use %3cdr[assoc[Y;vpl]]%* to retrieve %32%*.   The
offset plus this retrieved value gives  us the relative position of %3Y%*
in  the stack.   I. e.,  -3 +  2 = -1.   Thus  to refer  to the stack
location of %3Y%*  we could  use %3(...#-1#,#P)%*.  
That indeed, is the initial relative position of %3Y%*.

What  happens as we  add
elements to  the stack?  Or to  be more precise, what  happens as we
generate code which when  executed will add  elements to the  stack.
Clearly we must modify the  offset.  If we add one  element, we would
set  %3off%* to -4.   Then to reference  %3Y%* now use  -4 + 2  = -2; that is, a
reference to %3Y%* is now of the form:

←%3( ......-2 P).%*

But that's right.  %3Y%*  is now further down in the run time  stack.  Thus
 the  `symbol table' is really  defined by %3off%*
plus the  current  %3vpl%*. Here's  a preview of the coming %3compexp%*
in it's performance of variable recognition along with 
a constructor to make variable-reference
instructions:
.BEGIN ;GROUP;SELECT 3;CENTERIT;

←isvar[exp] → list[mkvar[1;loc[exp;off;vpl]]]%1 where:%3

←loc  <= λ[[x;off;vpl]plus[off;cdr[assoc[x;vpl]]]]%1 and,%3

←mkvar <= λ[[ac;mem]list[MOVE;ac;mem;P]]
.END

Next, when will the compiler make modifications  to the top  of the stack? 
We push new  elements when we are compiling  the arguments to a function
call. We know that %3complis%* is the function  which compiles the  argument list.
Thus our new %3complis%* had better know about %3off%* and %3vpl%* and
since %3complis%* changes the state of the stack, then %3complis%* had better
change %3off%* appropriately:

.BEGIN SELECT 3; TABIT2(22,32);GROUP;CENTERIT;

complis <= λ[[u;off;vpl]\[null [u] → ( )
\ %Et%* → append\[compexp [first[u]; off; vpl];
\\ list[mkpush[1]];
\\ complis [rest[u]; off-1; vpl]]]

←mkpush <= λ[[ac]list[PUSH;P;ac]]
.END

Notice  that   %3complis%*  compiles  the   arguments  from  left   to  right,
interspersing  them with %3(PUSH#P#1)%* and recurring with  a new offset
reflecting the effect of the %3PUSH%*. Clearly this function is analogous to
%3evlis%*.

Here's a brief description of the parts of the new compiler. This compiler was 
adapted from one written
by John McCarthy in conjunction with a course at Stanford University
entitled %3Computing with Symbolic Expressions%*. The McCarthy compiler
was also the subject of study by Ralph London in his paper,
%3Correctness of Two Compilers for a LISP Subset%*.

.BEGIN INDENT 0,15;
%3compile[fn;vars;exp]: fn%* is  the name of  the function to be  compiled. %3vars%*  is the
list of lambda variables.  %3exp%* is the lambda-body.

%3prup[vars;n]:  vars%* is  a  lambda list, %3n%* is an  integer.   %3prup%*  builds  a
variable-pair list.


%3loadac[n]%*:  is defined on {yon(P56)}.

%3compexp[exp;off;vpl]%*: this  function does most  of the work.   It is  analogous to
%3eval%*.
It generates the code for constants,
for a reference to a variable,
and for conditional expressions.
It is  used for compiling code for  a call on a function,
compiling the argument list with %3complis%*, which will
leave the arguments in the stack; %3loadac%* loads the  appropriate %3AC%*'s.
and then we generate the %3SUB%*  to sync the stack, and finally generate
a call on the function.

%3comcond[u;l;off;vpl]%*: this compiles the body of conditional expressions.  %3u%* is the
p%4i%* - e%4i%*  list; %3l%* will be  bound to a  generated symbol name;
%3off%* and %3vpl%* will always be the offset and the variable-pair list.
.END
.END

Fortified by the previous %3compile%* functions and this introduction
the new %3compile%* should be clear.

.BEGIN NOFILL ;select 3;TURN OFF "←";

.GROUP
compile <= λ[[fn;vars;exp]
             λ[[n]
               append[list[mkprolog[fn;n]];
                      compexp[exp; -n; prup[vars;1]]
                      mkepilog[n]]
              [length[vars]] ]

.APART
.BEGIN GROUP CENTERIT;
.SELECT 3;GROUP;
←mkprolog <= λ[[f;n]concat[list[LAP;f;SUBR];mkpushs[n;1]]]

mkpushs <= λ[[n;m]
 	    [lessp[n;m] → ( );
	     T → concat[mkpush[m]; mkpushs[n;m+1]]]

←mkepilog <= λ[[n]list[mksync[n];mkexit[]]]
←mksync <=λ[[n]list[SUB;P;list[C;n]]]
←mkexit <=λ[[](POPJ P)]
.END

.GROUP
prup <= λ[[vars;n] 
	  [null[vars] → ( );
	   T → concat[cons[first[vars]; n];prup[rest[vars];n+1]]]


.APART
.GROUP

.BEGIN TABIT1(30);
compexp <= λ[[exp;off;vpl]
     [isconst[exp]] → list[mkconst[1;exp]];
      isvar[exp] → list [mkvar[1;loc[exp;off;vpl]]];
      iscond[exp] → comcond[args[exp];gensym[ ];off; vpl];
      isfun+args[exp] → compapply[\fn[exp];
\length[argsf[exp]];
\complis[argsf[exp];off;vpl]];
.END

compapply <=λ[[fn;n;arglist]
              append[arglist;loadac[-n];list[mkcall[n;fn]]]]

.APART

.GROUP

comcond <= λ[[u;glob;off;vpl]
             [null[u] → list[mkerror[ ];glob];
              T →append[comclause[first[u]; gensym[];glob;off;vpl];
                        comcond[rest[u]; glob;off;vpl] ]

.BEGIN TABIT1(17);
comclause <=λ[[p;loc;glob;off;vpl]
               append[\compexp[ante[p];off;vpl];
\list[mkjumpe[loc]];
\compexp[conseq[p];off;vpl];
\list[mkjump[glob];loc]
\]]

.APART
.END
.END


Here is a partial sketch of %3compile%* operating on %3j <= λ[[x;y]f[g[x];h[y]]]%*.
Compare the code it generates with the code we saw on {yon(P165)}.

.BEGIN SELECT 3;TABIT2(10,17);CENTERIT;

.GROUP
compile[J;(X Y);(F (G X) (H Y))] %1gives:%*
\append\[((LAP J SUBR));      
\\ mkpushs[2;1];
\\ compexp[(F (G X)(H Y));-2;prup[(X Y);1]];
\\ ((SUB P (C 2))
\\  (POPJ P))];%1
.APART

.GROUP
where:
←%3mkpushs[2;1]%* gives %3((PUSH P 1)(PUSH P 2)),%* and
←%3prup[(X Y);1]%* gives %3((X . 1)(Y . 2))%*.
.APART
.GROUP

%3compexp[(F (G X)(H Y));-2;((X . 1)(Y . 2))]%* results in:
%3
\append\[complis[((G X)(H Y));-2;((X . 1)(Y . 2))];
\\loadac[2];
\\((CALL 2(E F)))]

%1and %3loadac[2]%* evaluates to: %3((POP P 2) (POP P 1))%*.
.APART
.GROUP

Thus the code we're getting looks like:

%3
\\((LAP J SUBR)
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ complis[((G X)(H Y));-2;((X . 1)(Y . 2))]
\\ (POP P 2)
\\ (POP P 1)
\\ (CALL 2 ( E F))
\\ (SUB P (C 2))
\\ (POPJ P)
\\)

.APART
.FILL;
%3complis%1 is interesting since it actually uses the %3vpl%* we have been
carrying along. It gives rise to:
.NOFILL
.GROUP

%3
\append\[compexp[(G X);-2;((X . 1)(Y . 2))];
\\ ((PUSH P 1));
\\ complis[((H Y));-3;((X . 1)(Y . 2))]]

.APART
.GROUP
%1and the %3compexp%* computation involves, in part:

%3
\append[complis[(X);-2;((X . 1)(Y . 2))];
\\ ((POP P 1));
\\ ((CALL 1 (E G))]

.APART
.GROUP
%1Finally this %3complis%* generates the long awaited variable reference using:

%3compexp[X;-2;((X . 1)(Y . 2))] %1giving,
\\%3((MOVE 1 -1 P))%*.
.APART

Notice that the offset is different within the call:

←%3 complis[((H Y));-3;((X . 1)(Y . 2))].%1

Finally, you should convince yourself that the code, though inefficient, is correct.
.END
.APART

.CENT(Problems)

I  Simple problems

1. Evaluate %3compile[%1  h.s.] 
etc.

2. Complete the code generation for the above example.


.SS(A project: Efficient compilation)
.P35:
Certainly the most striking feature of the last %3compile%* is its
outrageous inefficiency. 
In the next sections we will be interested in improving the quality
of the code which our compilers produce. This process is called optimization⊗↓One
might be tempted to call our current compiler "pessimizing".←.
Optimization improves the match between  the programming language
and the hardware machine.
Examination of the compilation of even the
most simple function suggests many possible improvements. 

We should first clarify what we mean by efficiency in this context.
We might mean minimal compile-time. In this case we would want a very
simple compiler; this usually means that the compiled code is extraordinarily
bad. Such compilers might suffice for debugging runs or student projects.
More likely, efficient compilation is taken to mean production of
code which we could expect from a reasonably bright machine-language programmer.
It should run reasonably fast, not have obviously redundant instructions, and
not take too much space in the machine. It is this second interpretation
of efficiency which we shall use.

A major inefficiency occurs in saving and restoring  quantities on the
stack. For example, the sequence %3(PUSH P AC1)(POP P AC1)%* serves no
useful purpose. This is a symptom of a more serious disease. 
The compiler does not remember what will be in the ACs at run-time. Since the
arguments to a function call must be passed through the special AC registers
and since it is expensive to save and restore these registers, we should
make a concerted effort to remember what quantities are in which ACs and
not reload them unnecessarily. This process will certainly  make the
compiler more complicated and that will mean longer compilation time but
compilation usually occurs less frequently than execution of compiled
code.

Here are some possibilities.
.SS(Efficiency: primitive operations,,P166:)
First we should be able to generate references to constants into %3AC%*'s
other that %3AC1%*. 
.GROUP;
For example, the call on %3f[1;A]%* should be generated
as:
.BEGIN GROUP;TABIT1(33);SELECT 3;
\(MOVEI 1 1)
\(MOVEI 2 (QUOTE A))
\(CALL 2 (E F))
.END
There is absolutely no reason to generate the constants and save them in the 
stack.
.APART

We should also expect that the LISP primitive operations, %3car, cdr, cons, eq,%*
and %3atom%* should occur rather frequently in compiled code; and we
should expect that a reasonable compiler be cognizant of 
their existence and compile more efficient code for their execution.
In this section we will enlarge the instruction set of our machine, adding
some plausible operations for these primitives⊗↓These instuctions
exist on the PDP-10.←. In the description of these new instructions
%3ac%* will always refer to an %3AC%*-register; %3loc%* will be either
an %3AC%* or a memory location, and %3mem%* will be reserved for memory
references only.

%3CAR%* is an instruction, taking two arguments:
an %3ac%* and a %3loc%* 
respectively. The %3car%* operation is performed from %3loc%* to %3ac%1.
For example when compiling the call, %3f[1;car[x]]%*,
we want to get %3car[x]%* in %3AC2%*. If %3x%* was in -%35#P%* then we could
accomplish our loading directly by 
%3(CAR 2 -5 P)%1 instead of the incredible:
.BEGIN TABIT1(33);SELECT 3;GROUP;

\(MOVE 1 -5 P)
\(CALL 1 (E CAR))⊗↓%1We have flushed a %3PUSH-POP%* pair.%3←
\(PUSH P 1)
\(POP P 2)
.END

We can also exploit the fact that the second argument to %3CAR%* is a %3loc%*:
the second argument to %3f[1;car[car[x]]]%* could have been compiled as:
.BEGIN TABIT1(33);GROUP;SELECT 3;
\(CAR 2 -5 P)
\(CAR 2 2)
.END

We will assume the existence of an analogous %3CDR%* instruction. With
these two instructions we can significantly improve the code for
%3car-cdr%*-chains.

Another source of  efficiency is available to us. Consider the clause:
.BEGIN CENTER;SELECT 3;
[eq[x;A] → B; ...]
.END
.BEGIN GROUP;
Assuming that %3x%* was on the top of the stack, our current compiler
would generate:
.TABIT1(33);SELECT 3;
\ ( 
\  (MOVE  1 0 P)
\  (MOVEI 2 (QUOTE A))
\  (CALL 2 (E  EQ))
\  (JUMPE 1 L1)
\  (MOVEI 1 (QUOTE B))
\  (JUMP LOUT)
\L1 ...
\  )
.END
The use of predicates in this context does not require the the explicit
construction of the constants %et%* and %ef%*. All we need to is implement
the %3eq%* test as a jump to one of two locations.

We will introduce an instruction %3SKIPE%* taking two arguments;
first, an %3ac%* and the second, a %3loc%*. %3SKIPE%* compares
the contents of the the two arguments, and if they are equal, it skips
the next instructions. 

.BEGIN TABIT1(33);GROUP;
Thus the above example could be compiled as:
.SELECT 3;
\ ( 
\  (MOVEI 1 (QUOTE A))
\  (SKIPE 1 0 P)
\  (JUMP L1)
\  (MOVEI 1 (QUOTE B))
\  (JUMP LOUT)
\L1 ...
\  )
.END
Notice that we have added an extra piece of knowledge to the compiler;
it knows that %3eq%* is commutative in this instance⊗↓%3eq%* need %2not%*
commute. If there are side-effects in the computation of the arguments, the
order can make a difference. However unless explicitly stated our compilers do
not have to consider side-effects.←. 
We still require some artifacts in the compiler to generate complete
predicates; predicates may be returned as real values. But in many instances,
particularly within %3compcond%*, we can expect to generate tighter code.

Before we can exploit these new operations we need to indoctrinate
the compiler in their proper use. The next section begins that task.
.SS(Efficiency: calling sequences)
Here is the incredibly bad code which the current compiler will produce for call 
%3f[1;g[3];#car[x]]%*:

.BEGIN tabit1(33);SELECT 3;GROUP
\(MOVEI 1 T)
\(PUSH P 1)
\(MOVEI 1 3)
\(PUSH P 1)
\(POP P 1)
\(CALL 1 (E G))
\(PUSH P 1)
\(MOVE 1 0 P)
\(PUSH P 1)
\(POP P 1)
\(CALL 1 (E CAR))
\(PUSH P 1)
\(POP P 3)
\(POP P 2)
\(POP P 1)
\(CALL 3 (E F))
.END

By way of motivation and introduction, here is what next compiler does for the
same call:

.BEGIN TABIT1(33);SELECT 3;GROUP;
\(MOVEI 1 3)
\(CALL 1 (E G))
\(MOVE 2 1)
\(MOVEI 1 1)
\(CAR 3 0 P)
\(CALL 3 (E F))
.END

Examination of the code shows the results of several efficiencies:
First we are using  the %3CAR%* instruction of the last section.
We are also doing operations into %3AC%*'s other than %3AC1%*. This
allows us to remove some obnoxious %3PUSH-POP%* sequences.

The major efficiency involves an analysis of the arguments being compiled for
a function call. 
Much of LISP's activity involves function calls. Much of the current 
compiler's inefficiency involves generation of arguments to those calls.
This is a bad combination.
Thus we should concentrate some effort on this area of the compiler.
That part is  %3complis%*.
Within our new %3complis%* we will
divide the arguments into two classes:#trivial and complex.

Trivial arguments are those which need make no demands on the runtime stack;
the computation they entail can all be done in the %3AC%* registers. Thus
the code that the compiler generates need not involve %3PUSH-POP%* sequences.
For example, references to constants need not be generated and then pushed
onto the stack; we can compile the other arguments first and then just before
we call the function, stuff the appropriate %3AC%* register with that
constant. A similar argument can be used for postponing the loading of
variable references⊗↓But note that the argument for variables is shakey;
if our compiler handled programs with side-effects then we could %2not%*
be sure that the postponed value would be the same as that gotten if
we had loaded at in the "proper" time.←. The third trivial construct for this
%3complis%* is the handling of %3car-cdr%* chains. We will use our augmented
instruction set to perform computation of %3car%*s and %3cdr%*s directly
to a specified %3AC%* register.

Complex arguments are those which require some non-trivial computation; they
will be done in the spirit of the previous %3complis%*. That is, calls on
%3compexp%*, interspersed with appropriate %3PUSH%*es. However we can 
even be a bit clever here. After we have computed the %2last%* complex argument
there is no reason to push it into %3P%*; we know we will simply bring it back into
an %3AC%*. It might be an %3AC%* different from %31%*, but it will be an %3AC%*.
Instead of doing the  %3PUSH-POP%* dance, we will  %3MOVE%* the last
complex argument into the correct %3AC%*. Before continuing
you should go back and look at the example code. Convince yourself that the
advertized optimizations are applicable.

Now to the construction of the new %3complis%*. Besides compiling efficient
code we would also like to make the compiler efficient. We would like to make the
compiling process as one-pass as possible. Our basic tasks in the new %3complis%*
are calssification of the arguments and compilation of the code. With a little
care we can do both at the same time. There is nothing problematic about
the compilation of the trivial code⊗↓that's why it's trivial?←. We thus
turn to the complex code. 

The old %3complis%* generated a block <code %3arg%4i%1>-%3PUSH%1 on each
cycle. The problem here is that we don't want that last %3PUSH%*; we
want the last one to be a %3MOVE%*. However, when operating in a one-pass
fashion, we don't know when we've seen the last one until it's too late.
We could just generate that erroneous %3PUSH%* and then go in and change it
later. That's not particularly efficient; it would require %3rest%*-ing through
the whole sequence of generated code. What we do instead is generate a
%3PUSH%*-<code#%3arg%4i%1> sequence on each recursion. Now we have a suprious
%3PUSH%* on the %2front%* of the sequence; one %3rest%* will take care of %2that%*.

We must also be generating a list of %3POP%*s to suffix to  the complex code
to get the saved values back into the proper %3AC%*'s; one pop for each argument,
please.
That list will be short and therefor not expensive to manipulate. A moments
reflection shows two things: first, the list is generated in the reverse order
to the way we need it; second, the %2last%* %3POP%* shouldn't be there since
we have arranged to flush the corresponding %3PUSH%*. However the memory
field of the last %3POP%* has some useful information; it will tell us where
the %3MOVE%* we want to make should go: 
.BEGIN CENTER;SELECT 3;
(POP P N) =>  (MOVE N 1)
.END
This modified list of %3POP%*s is added to the code sequence. Finally, any
trivial code is stuck on behind. With this introduction, here is %3complis%* and
friends:

.BEGIN turn on "\"; no fill;TABs 12,23,35,48;SELECT 3;TURN OFF "←";
.GROUP
complis <= λ[[u;off;vpl]complis%9'%*[u;off;vpl;();();();1]

complis%9'%* <= λ[[u;off;vpl;triv;cmplx;pop;ac]
\[null[u] →\[null[cmplx] → triv;
\\ %et%* → append\[rest[cmplx]
\\\ λ[[z]append[list[mkmove[1;mem[first[z]]]];rest[z]]]
\\\  [reverse[pop]]⊗↓%1Yes, an internal λ-expression; and yes,
%3append[list..%1; see %3mkmove.←
\\\triv]
\\]
\ isconst[first[u]] → complis%9'%*[\rest[u];
\\\off;
\\\vpl;
\\\concat[mkconst[ac;first[u]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]]
\ isvar[first[u]] → complis%9'%*[\rest[u];
\\\off;
\\\vpl;
\\\concat[mkvar[ac;loc[first[u];off;vpl]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]]
\ iscarcdr[first[u]] → complis%9'%*[\rest[u];
\\\off;
\\\vpl;
\\\append[reverse[compcarcdr[ac;first[u];off;vpl]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]]
\ %et%* → complis%9'%*[\rest[u];
\\sub1[off];
\\vpl;
\\triv;
\\concat[mkpush[1];append[compexp[first[u];off;vpl];cmplx]];
\\concat[mkpop[ac];pop];
\\add1[ac]]
\]
.END

.BEGIN CENTERIT;SELECT 3;
←mkmove <= λ[[ac;loc][eq[ac;loc] → (); %et%* → list[MOVE;ac;loc]]]
.END

.BEGIN TABIT3(6,13,23);SELECT 3;TURN OFF "←";GROUP

compcarcdr <= λ[[ac;exp;off;vpl]
\\[iscar[exp] → [isvar[arg[exp]] → list[list[mkcar[ac;loc[arg[exp];off;vpl]]]
\\\%Et%* → concat[mkcar_ac[ac;ac];compcarcdr[ac;rest[exp];off;vpl]]
\\[iscdr[exp] → [isvar[arg[exp]] → list[list[mkcdr[ac;loc[arg[exp];off;vpl]]
\\\%Et%* → concat[mkcdr_ac[ac;ac];compcarcdr[ac;rest[exp];off;vpl]]
.APART
.GROUP

iscarcdr <=λ[[u]
\\[iscar[first[u]] →iscarcdr[arg[u]]
\\ iscdr[first[u]] →iscarcdr[arg[u]]
\\ atom[u] → %et%*
\\ %et%* → %ef%*
\\ ]]
.END

.SS(Efficiency: predicates)
We have already noted in {yonss(P166)} that some efficiencies are possible in
the handling of predicates inside of conditional expressions. Here we will examine
more possibilities. The first contentious point is that the current
%3compclause%* is %2not%* good enough. We want to be able to use the Boolean
special forms: %3and[u%41%*;#...;u%4n%*]%1 and
%3or[u%41%*;#...;u%4n%*]%1. But part of the definition of these constructs was that
they %2not%* evaluate any more arguments than necessary. The current calling
conventions available inside %3compclause%* only cover call-by-value. To alleviate
this difficulty we will add recognizers for %3and%* and %3or%* inside
%3compclause%* and will add a new section to the compiler to deal with 
their compilation.

First, here is the structure of typical code sequences:

.BEGIN TABIT2(10,40);SELECT 3;GROUP;
\and[u%41%*; ... u%4n%*] → e;\or[u%41%*; ... u%4n%*] → e;
\ %1gives: \gives:%3

\   u%41%*\   u%41%*
\   (JUMPE lint)\   (JUMPN loc)
\   u%42%*\   u%42%*
\   (JUMPE lint)\   (JUMPN loc)
\    . . .\ . . . . .
\   u%4n%*\   u%4n%*
\   (JUMPE lint)\    (JUMPN loc)
\\loc
\    %1<code for %3e%*>\    <code for %3e%*>%3
\   (JUMP lout)\  (JUMP lout)
\lint\lint
.END
.BEGIN GROUP;
 Now here is a %3compclause%* which will generate it:

.TABIT1(17);select 3;

comclause <=λ[[p;loc;glob;off;vpl]
               append[\compred[ante[p];loc;off;vpl];
\compexp[conseq[p];off;vpl];
\list[mkjump[glob]loc]
\]]

.APART
.BEGIN TABIT1(10);GROUP;SELECT 3;

compred <= λ[[p;loc;off;vpl]
\[isand[p] → compandor[args[p];off;vpl;list[mkjumpnil[loc]];()]
\ isor[p] → λ[z][compandor[args[p];off;vpl;list[mkjumpt[z]];list[mkjmp[loc];z]]][gensym[]]
\ %et%* → append[compexp[p;off;vpl];list[mkjumpnil[loc]]]
\ ]]
.APART;
.GROUP;

compandor <=λ[[u;off;vpl;inst;fini]
\[null[u] → fini;
\ %et%* → append [compexp[first[u];off;vpl];inst;compandor[rest[u]off;vpl;inst;fini]]
\]]]
.END
.END

.CENT(Problems)
%2I%* We should recognize the construct %et%*#→%3e%4i%1 in conditional expressions
and compile special code for it. We should also realize that in the construct:
.BEGIN CENTER;SELECT 3;
[p%41%* → e%41%* ... %et%* → e%4i%*; ...p%4n%* → e%4n%*]
.END
we can %2never%* reach any part of the conditional after the %et%*-predicate.
Therefore no code should be generated. Rewrite the compiler to handle these
additional observations about conditionals. 

The second point, above, is a special instance of a general  compiling question.
How clever should the compiler be? If it can recognize that a piece of program
can never be reached, should it tell the user or should it compile mininal code?

%2II%* Write a new %3compile%* including all the efficiency considerations
discussed so far.
.SS(A compiler for %3progs%*)
Ther compiler of this section will not compile all %3prog%*s; it is only
intended to demonstrate the salient features of a %3prog%* compiler.
They are:
.BEGIN INDENT 0,5,5;
.GROUP
%21.%* Handling of assignments. Since we are assuming local variables,
then storage to the value stack is sufficient⊗↓If we allow non-local
references the we must store into the %3VALUE%*-cells of the appropriate
atoms. To do this efficiently, requires open-coding of %3putprop%*; this
involves list-modification operations which we will discuss in {yonss(P171)}
and {yonss(P155)}; see also {yonss(P5)}.←.
.APART
.GROUP
%22.%* The %3go%*-label pair; We will assume that this can be passed off to the
assembler.
.apart
.group
%23.%*  On leaving a %3prog%*-body we have to flush the %3prog%*-variables
from the top of the stack. This is done by comparing the current %3off%*
with %3vpl%*.
.END
Most of the other features for an inefficient compiler are straightforward;
efficient compilers are much more difficult.

.BEGIN SELECT 3;GROUP;nofill;TABIT1(11);

compprog <=λ[[locals;body;off;vpl]
λ[[n]append[mkpushnil[n];
	compbody[body;off-n;pruploc[locals;-off;vpl];n;gensym[]]
	][length[locals]]
.APART
.GROUP

pruploc <= λ[[locals;off;vpl]
	[null[locals] → vpl;
	 %et%* → pruploc[rest[locals];off+1;concat[cons[first[locals];off];vpl]]
	]]

.APART;
.GROUP;
compbody <= λ[[body;off;vpl;n;exit]
[null[body] →list[mkconst[1;NIL];exit;mksync[n]]
 islabel[first[body]] → concat[first[body];compbody[rest[body];off;vpl;n;exit]]

 isgo[first[body]] → append[compgo[arg[first[body]]
				compbody[rest[body];off;vpl;n;exit]]

 isret[first[body]] → append[sync[off;vpl];list[mkjump[exit]];
				compbody[rest[body];off;vpl;n;exit]]

 issetq[first[body]] → append[compexp[rhs[first[body];off;vpl];
				list[mkmovem[1;loc[lhs[first[body]];off;vpl]
				compbody[rest[body];off;vpl;n;exit]]

 iscond[first[body]] → append[compcondprog[arg[first[body]]
				compbody[rest[body];off;vpl;n;exit]]

 %et%* → append[compexp[first[body];off;vpl];
				compbody[rest[body];off;vpl;n;exit]]
.END


notes: on go's assume labels and vars are local to one %3prog%*.

.CENT(Problem)
Write %3compcondprog%*.
.SS(Further optimizations)
This section  is in the nature of hints and possibilities  for expansion of the
basic compiling algorithms.

One of the first things to note about the compiling algorithm is its lack of knowledge about
it has in the various %3AC%* registers. Typically the prolog of the
compiled code will load up one on the registers with a quantity that is already 
there. Hardly inspired programming. Thus the first suggestion: build a list of 
what's in various registers. We know what's there when we enter the function;
whenever we perform an operation which destroys a register then we have to 
update the compiler's memory. Whenever we need a quantity, we check the
memory. If the object is already in the %3AC%*'s then use it. Clearly there is a
point at which the complexity of the object stored is too complicated to be worth
remembering. However the idea can be used quite profitably for variable references
and simple computations. This idea is a simple form of
%2common sub-expression elimination%*. For example, 
assuming the the compiler knows that %3x%* is in %3AC1%*, here's code for:
.BEGIN CENTER;SELECT 3;

f[car[x];cdr[car[x]]]. %1It's:
.END
.BEGIN TABIT1(33);SELECT 3;GROUP;

\(CAR 1 1)
\(CDR 2 1)
\CALL E (E F))
.END

This idea can be extended. There is nothing sacred about knowing only the
contents of the special registers. We could keep a history of the partial computations
in the stack. Then if we need a partial result we might find it already computed
in the %3AC%*s or stored on the stack. This idea must be used with some care.
Side-effects can destroy the validity of partial results.

Notice that we are comparing the symbolic values in the %3AC%*'s  or stack,
we cannot look for actual values. This idea of symbolic processing can be exploited
at a much more sophisticated level in LISP compilers. Since the program to be 
compiled is available as a data structure, we can perform data structure operations
on that program. In particular, we can perform program transformations.
For example, the compiler can rewrite program segments taking advantage of
transformations it knows. These transformations typically involve equivalence
preserving operations which might lead to more efficient compiled code.

For example several LISP compilers have the ability to perform recursion removal,
replacing recursive programs with equivalent iterative versions⊗↓All these
transformations are typically  invisible to the user. Systems which
rewrite user's programs gratuitiously are evil.←.
Here's a case:

.BEGIN SELECT 3;CENTERIT;
.p172:
←rev <= λ[[x;y][null[x] → y; %et%* → rev[rest[x];concat[first[x];y]]]]
.END
This program is automatically rewritten as:
.BEGIN SELECT 3;GROUP;TABIT2(10,13);turn off "←";

rev <= λ[[x;y]prog[]
\l\[null[x] → return[y]]
\\y ← concat[first[x];y]
\\x ← rest[x];
\\go[l]
\]]
.END
This second version makes no demands on the run-time stack to save its
partial computation like the recursive version would. Typically 
this second version will execute faster.

A major obstacle to  most kinds of optimization is the unrestricted use
of labels and gotos. Consider a piece of compiler code which has a label attached
to it.
 Before we can be assured of the integrity of an AC
we must ascertain that every possible path to that label maintains that AC.
This is a very difficult task. The label and goto structure required by
%3compile%* is quite simple. However if we wished to build an optimizing  
compiler for
LISP with %3prog%*s we would have to confront this problem.

.CENT(Problems)
%21.%* Extend the compiling algorithm to remember what it has in its %3AC%*
registers. How much of the scheme is dependent on lack of side-effects.

%22.%* Titled: " If we only had an instruction... "
We are advocating an instuction, %3EXCH ac source%*, which will exchange
the contents of the two referenced cells. This instruction could be used
effectively on the code for %3j%* on {yon(P163)} to save a %3PUSH-POP%*
pair. 
.BEGIN GROUP
Here's %3EXCH%* in action, using the results of  the previous exercise:
.TABIT2(10,45);SELECT 3;

\(LAP J SUBR)\%1; says %3j%* is a function%3
\(PUSH P 2)
\(CALL 1 (E G))\%1; call the function named %3g
\(EXCH 1 0 P)\%1; save the value and dredge up %3y
\(CALL 1(E H))\%1; call %3h
\(MOVE 2 1)
\(POP P 1)\%1;   preparation for%3
\(CALL 2(E F))\%1;   calling %3f
\(POPJ P)\%1; exit.%3 AC1%* still has the value from %3f.
\   )
.END

Look for general situations where %3EXCH%* can be used. In your tour
through the compilers, try to imagine other useful instructions⊗↓Something
less general than %3eval%*, please.←.

%23.%* Write pseudo-compiled code for the factorial function, and simulate
the  execution on %32!%*.

%24.%* Write a LISP function to take recursive schemes into equivalent iterative
ones in the style of the %3rev%* example on {yon(P172)}. Your first
version need not be as efficient as the one advertized there, but try to make it
better as you proceed.
.SS(A project: One-pass assembler)
.P10:

III A one-pass ⊗→assembler↔←.

Write a one-pass assembler for the code generated by the %3compile%* function
of this section. You should be aware of the following points:

.BEGIN INDENT 0,5;

%2a.%*  %3QUOTE%*d expressions must be protected from garbage collection. The
simplest way to accomplish this it to  save them on a list, say %3QLIST%*.

%2b.%*  Use the operation codes of {yonss(P7)})

%2c.%*  Design a simple fixup scheme.  Notice that %3compile%*'s code will
require address fixups at most.

%2d.%*  Recall that the format of the code is a list. The items in the list are
either atomic -- representing labels --, or lists -- representing instructions--.
The instructions have varying format.  Perhaps a ⊗→table-driven↔← scheme can be used.

%2e.%*  Some attempt should be made to generate error messages.

%2f.%*  Many of the constants, %3(C n)%*, occur frequently; these constants
are only referenced, never changed by execution of the compiled code.  Write
your assembler to maintain only one copy of each. The constants should be stored
directly after the compiled code.  

%2f%*.  Try to be equally clever about storing %3QUOTE%*d expressions.
.P36:

IV ***problem on internal lambdas

.END
.SS(A project: Syntax directed compilation,syntax-directed,P4:)

compilation for sae

BNF for mexprs

syntax directed compiler

scanner parser
.SS(A deep-binding compiler,deep binding)

**sketch of tgmoaf deep binder conventions

**project: do tgmoafr and eval d-b compr

**hints about globals and name stack

**project: do eval compiler using name-value stack for locals.

.SS(Compilation and global variables,global variables,P5:)
.BEGIN TURN ON "#";

**** expand greatly***

The models of compilation which we have sketched so far store
their  λ-variables  in  the  stack,  %3P%*.   References  to  those
variables in  the body of  the λ-expression are  made to  those
stack entries.  This scheme suffices  only for lambda or %3prog%* (local)
variables.   We have said that λ-expressions may refer to global
or free  variables.  The  lookup mechanism  simply finds the  current
binding of  that global in the  symbol table.  Notice that  this is a
different strategy than the global lookup of Algol.  In Algol, when a
procedure refers to a global we take the  binding which was current at
the  point of definition  of the procedure.   The  LISP mechanism is
called %2⊗→dynamic binding↔←%*.   It logically corresponds to 
a physical  substitution of 
the body  of the definition for the  function wherever it  is called.
Its model  of implementation is simpler than that required for Algol.
We don't need the ⊗→static chain↔← for this case of  global lookup.  Thus
interpreted expressions encounter  no problems when faced with global
variables.  There are potential  difficulties for compiled code.   If
all we store  of the stack in a  compiled function is the value  of a
variable, then another  program which  expects  to use  that variable
globally will have no way of  finding that stored value.  One  scheme
is to store  pairs on the stack:  name and value  then we
can  search the stack for  the latest binding.  It  works.  With this
scheme we  can dispense  with the  ⊗→%3VALUE%*-cell↔←.  Actually this  scheme
isn't  all that bad.   The  compiler can still  `know' where  all the
local variables  are on  the stack  and  can be  a bit  clever  about
searching for  the  globals. 

The solution proposed  by the %aSM%*-%3eval%1 implementation called
%2⊗→shallow  binding↔←%*, allows  the compiled  code  to directly  access the
%3VALUE%*-cell in the symbol table.  There is an artifact in the compiler
(and  assembler) called  %3SPECIAL%*.    Variables which  are  to be  used
globally  are declared ⊗→special variables↔←.   When a  variable, say %3x%*, is
declared special the compiler will emit a reference to %3x%* as 
%3(GETV#AC%4i%*#(SPECIAL#X))%1
or %3(PUTV#AC%4i%*#(SPECIAL#X))%1⊗↓%3GETV%* and %3PUTV%* are sugar coated
machine instructions to access or modify the contents of the %3VALUE%*-cell.←
rather than the corresponding
reference to a location on the  stack.  When the LISP assembler  sees
the indicator %3SPECIAL%*, it will search the  symbol table for the %3VALUE%*-cell
 of %3X%* and  assemble a reference to that cell.  Since the location
of the value  cell does %2not%*  change, we can  always find the  current
binding.  Notice too  that any
interpreted function  can also sample the %3VALUE%*-cell so global values
can be passed between compiled and interpreted functions.  The values
of local variables are still posted on the stack.

Free variables, therefore require some changes to our compiler:
we have to be a bit careful in dealing with free variables which are also used as
λ-variables. Assume a function %3f%* calls a function %3g%*; and assume that
%3g%* uses  some of %3f%*'s λ-variables  as free variables. 
However the usual compilation for %3f%* would place the values in the stack %3P%* 
and they would not be accessible to %3g%*. Our compiler must therfore be
modified to replace the %3PUSH-POP%* code with %3BIND-UNBIND%* pairs for each
variable we expect to use freely with the context of %3f%*; then any references
to those free variables must involve  %3GETV-PUTV%* rather than 
references into the stack %3P%*.

We have not yet  discussed the mechanism which will  allow us
to  pass back and  forth between compiled  and interpreted functions.
To complete that discussion we must introduce the %aSM%* instruction for
calling a function:

.BEGIN TABIT1(19);TURN OFF"←";
.SELECT 3;
PUSHJ P fn\C(P) ← C(P) + 1
\C(C(P)) ← C(%aPC%*)
\C(%aPC%*) ← fn. %1This is called the "push-jump" instruction.
.END

First we 
require that the  calling conventions for both kinds  of functions be
isomorphic.

When an interpreted function calls  a compiled (or primitive)
function, %3eval%* will  look for the indicator,  %3SUBR%*; then retrieve the
machine address of the code and enter via a %3PUSHJ%*. That code should exit
(back to %3eval%*) via a %3POPJ%*, after assuring that the stack, %3P%*, has
been restored to the state at the time of entry.

Compiled functions  call other functions  via the %3(CALL#%1n%*#(E#%1fn%3))%1
artifact.  The %3CALL%*  opcode is actually  an illegal instruction
which is trapped to a submonitor inside %3eval%*.  This  submonitor looks
down the  property list  of the  atom, fn,  for a  function indicator
(%3SUBR, EXPR%*,  etc).  The  function is  called  and  control  is then
returned to  the address  immediately following  the %3CALL%*.   In  many
cases this %3CALL%* can be replaced  by a %3(PUSHJ#P#%1fn%3)%*, but not always as
we will see next.
.END
.SS(Functional Arguments,,P3:)

***more h.s.***

***farting with funarg***

funarg b-w. and weizen

**add deep-binding compiler**

what does this say about  the CALL mechanism in the compiler?  It says that
the calling  mechanism  for  a  functional argument  must  always  be
trapped the submonitor and decoded.  We cannot replace that call with
a  PUSHJ to some machine  language code for  the function because the
function referred to can change.  We actually use  a CALLF instruction
to designate a call on a functional argument.
	
If the function is a variable name we cannot know at compile-time, what to
%3PUSHJ%* to. Indeed, since the value of the expression may very well change
during execution we can %2never%* replace the %3CALL%* with a %3PUSHJ%*.
.SS(Macros and special forms,macros,P57:)
.P18:
Recall our discussion of ⊗→special forms↔← in {yonss(P8}).
Special forms have been used for two purposes: to control the evaluation
of arguments (conditional expressions, quoted expressions,
%3and, or%*, etc.), and to create the effect of functions with an indefinite 
number of arguments (%3list, append, plus, ...%*)
Frequently this second application of special forms can be improved upon,
particularly in the presence of a compiler.
Describing such functions as special forms is sufficient,
but not efficient, since the body of the definition must
contain explicit calls on %3eval%*. Even though we wish to define
some functions as if they had an arbitrary number of arguments, when
we %6use%* (i.e. call) the function, the number of arguments to be
applied %6is%* known. In particular, when the compiler is examining the
program, the number of arguments is known. Hopefully the compiler can be
made smart enough to compile better code than calls on %3eval%*.
That hope is well-founded as we shall see in {yonss(P57)}.
In the meantime we will discuss a language device called %2macros%*
simply as a notational convenience.

Assume, for example, we wish to define %3plus%* as a function with
an indefinite number of arguments:

.BEGIN CENTERIT;SELECT 3;GROUP;
←plus[4;5] = 9
←plus[4;5;6] = 15
←plus[4;add1[2];4] = 11
.END

That is %3plus%* is to  have the properties of a function: its arguments
are to be evaluated (from left to right); but it can take an arbitrary number
of arguments.
What is needed here seems clear: define %3plus%* in terms of a %2binary%*
addition function, %3*plus%*.

.BEGIN CENTERIT;SELECT 3;GROUP;
←plus[4;5] = *plus[4;5] = 9
←plus[4;5;6] = *plus[4;*plus[5;6]] = 15
←plus[4;add1[2];4] = *plus[4;*plus[add1[2];4]] = 11
.END

That is, we %2expand%* calls on %3plus%* into a composition of calls on
%3*plus%*.
%3plus%* is being used as a %2macro%* and the expansion process in terms
of %3*plus%* is called %2macro expansion%*. Thus  macro expansion
generates a composition  calls on %3*plus%*.
Realize too, that since LISP programs must perform equivalently when
interpreted, we must recognize a macro construction inside %3eval%*.

How are macros written and how do they work?  The body of a macro is a 
λ-expression of %2one%* argument. The %2use%* of a macro looks just like
an ordinary function call, but what is bound to the λ-variable is the whole
call on the macro.  When you look at the evaluation mechanism for
macros in the %aSM%*-%3eval%* you will see that the result of the macro
expansion is passed to %3eval%*. Thus the task of the macro body is to
expand the macro call and return this expansion as its value.

.BEGIN TURN ON "#";
Let's define %3<%4m%*=%1 to mean "#is#defined#to#be#the#macro#...".
Then a simple macro definition of %3plus%* might be:
.END

.BEGIN SELECT 3;TABIT2(10,25);GROUP;

.P58:
\plus <%4m%*= λ[[l]\[eq[length[l];3] → concat[*PLUS;cdr[l]]
\\ %et%* → list[*PLUS;second[l];concat[PLUS;rest[rest[l]]]]]]
.END

Thus a call %3(PLUS 3 4 5)%* would bind %3l%* to %3(PLUS 3 4 5)%* and
the evaluation of the body would result in %3(*PLUS 3 (PLUS 4 5))%*.
Evaluation of this expression would result in another call on the macro.
This time %3l%* would be bound to %3(PLUS 4 5)%*. Now
%3eq[length[l];3]%* %2is%* true and the value returned is %3(*PLUS 4 5)%*.
This can be evaluated, giving 9, and finally the outermost call on %3*PLUS%*
has all its arguments evaluated, and we get the final answer, 12.

This is a very simple (and inefficient) macro definition. Since the body
of a macro has available all of the evaluation mechanism of LISP, and
since the program structure of LISP is also the data structure, we can
perform arbitrary computations inside the expansion of the macro. Truly
LISP macros have the power to cloud men's minds.

Notice that %3SETQ%*  can easily be defined as a macro
over %3SET%*:
.BEGIN CENTERIT;SELECT 3;GROUP;

←setq <%4m%*= λ[[l] list[SET;list[QUOTE;second[l]];third[l]]].
.END
.P54:
The effect of "<%4m%*=" should be clear:
it puts the body of the macro definition on the property-list of the macro
name  under the indicator, %3MACRO%*. Likewise "<=" puts the function body
on the p-list under the indicator, %3EXPR%*. Similarly, we will define
"<%4f%*=" to mean "..is defined to be a special form..." and whose effect
is to put the  body under the indicator, %3FEXPR%*. 
Macros can be used to perform the operations which special forms
do, but with perhaps more efficiency. 
Macros can also be used with great verve in implementing abstract data structures.
The constructors, selectors, and recognizers which help characterize
a data structure can be expressed as very simple S-expr operations.
These operations are performed quite frequently in a data structure algorithm
and so any increase in their running efficiency will be beneficial.
On {yon(P73)} we will show how macros can be used to speed up
these data structure primitives.

The idea of macro processing is not recent.  Some of the earliest assemblers
had extensive macro facilities.  Lately macros have been used as a means
of extending so-called high level languages.
One of the most simple kinds of macros is %2textual substitution%*.
That is, when a use of a macro is detected we simply replace the call by its
body.  A slightly more sophisticated application is the %2syntax macro%*.
Everytime we come across an application of a syntax macro the expander 
processes it as if it had never been seen before even though much
of the expansion is repetitious. That is, syntax macros have no memory.

%2computational macros%* are an attempt to reduce some of this repetition.
In this scheme a certain amount of processing is done at the time the macro
is %2defined%*. For example a computational subtree reflecting the body of the
macro might be formed.  Then whenever the macro is %2used%* we can
simply make a copy of this subtree and "glue" this subtree into the parse-tree
which we are building.  This computational subtree is commonly formed by passing
the body of the macro through the compiler in a 
"funny" way.  The main problem with the computational macro is that there are
many desirable macros which have no such subtree, or there is other information
 necessary to process the macro.  There are solutions to this problem,
one of which closely parallels the abstract syntax ideas of McCarthy.
All of these styles of macros are subsets of the LISP macros.


.CENT(Problems)

I.  Define %3list%* and %3append%* as macros. You may use only
the LISP primitives (functions, predicates, conditionals and
recursion) and a binary function, %3*append%*.

II. Give a macro definition of an extended %3SETQ%*, which is called 
as %3(SETQ#%1var%41%*#exp%41%*#...#var%4n%*#exp%4n%3)%1.
Each var%4i%* is a name; each exp%4i%* is an expression to be evaluated
and assigned to var%4i%1. The assignments should go from "left-to-right".

Thus %3(SETQ X 2 Y (TIMES 2 X) X 3)%* when executed should assign
%33%* to %3X%* and %34%* to %3Y%*.


We wish  to extend our compiler to handle such definitions.

Consider the example of defining %3plus%* of an indefinite number of arguments
given on {yon(P58)}.
Notice first that difference in efficiency between the interpreted macro ({yon(P58)})
and the interpreted special form ({yon(P59)}) is very slight.  Both require calls on %3eval%*.
Special forms require explicit user-calls on the evaluator; macros do the calls
within the evaluator.
In the presence of a compiler we can frequently make execution of macros
more efficient than their special form counterpart. This is the case with %3plus%*.
When %3plus%* is called we know the number of arguments, and can simply
expand the macro to a nest of calls on %3*plus%*. For example:
.begin centerit;

%3
←plus[4;add1[2];4] %1expands to%* *plus[4;*plus[add1[2];4]] %1
which can be efficiently compiled. 
.end

.P73:
There is a closely related use of LISP macros which is worth mentioning.
Recall on {yon(P60)} we defined %3coef%* 
as %3car%*. Compiled calls on %3coef%* would invoke the function-calling
mechanism, whereas many compilers can substitute actual hardware instructions
for calls on %3car%*, resulting in more efficient run-time code.
So for efficiency's sake it would be better to write %3car%* instead of
%3coef%*. There are two objections to this. First, %3coef%* has more
mnemonic significance than %3car%*. Second, using %3car%* we have explicitly
tied our algorithm to the representation. Both are strong
objections.
Macros can help overcome both objections. Define:

←%3coef <%4m%*= λ[[l]cons[CAR;cdr[l]]]%1.

(Recall that the whole call %3(COEF ... )%* gets bound to %3l%*.)
So the user writes %3(COEF ...)%*; the evaluator sees %3(COEF ...)%* and
finally evaluates %3(CAR ...)%*; and the compiler sees %3(COEF ...)%*
and compiles code for %3(CAR ...)%*. So we can get the efficient code, the
readibility, and flexibility of representation with macros.

Since %3eval%* handles calls on special forms, we should examine the
extensions to %3compile%* to generate such code. We have seen that
in compiling  arguments to (normal) functions, we generate the code for
each, followed by code to save the result in the run-time stack, %3P%*.
The argument to a special form is %2unevaluated%*, by definition. All we
can thus do for a call of the form %3f[l]%*, where %3f%* is a special form,
is pass the argument, compiling something like:

.BEGIN CENTERIT;SELECT 3;GROUP;

←(MOVEI AC1 (%eR%f(%3 l %f)%3))
←(CALL 1 (E F))
.END

This should also be clear from the structure of %3FEXPR%* calling in the %aSM%*
evaluator. 


←%2Problems%*

I. Extend the last %3compile%* function to handle macros.

II.  We have seen the (necessarily) inefficient code generated by a compiler
for %3FEXPR%* calls.  Assume ⊗→%3and%*↔← is a special form with an indefinite
number of arguments.  Show how to modify %3compile%* to recognize %3and%*
and compile efficient code for its execution.

.SS(Debugging in general)
Few areas of the Computer Science field are as primitive
as that of  debugging. Few areas of the field are as important. Getting a
correct program  indeed is the whole point of our programming.
The power of our debugging techniques  has been directly related to the
sophistication of the hardware/software interface which is available.
Not until the advent of sophisticated on-line systems has there really 
been any hope for practical "correct-program" construction.

We will begin  with  a  very  brief historical  description  of  systems
organization, from the bare machine to multi-processing.
	In  the  early  days of  computers,  operating  systems  were
non-existent.   You would  sign up for  a couple of  hours of machine
time, appear  with your  card deck  or paper  tape,  and operate  the
machine.  Debugging devices consisted of a sharp pointed stick, and a
quick  hand on the console  switches.  This  means of programming was
very satifying to  many of the  programmers, but machine  utilization
left something  to be desired.   Much of  the time the  machine would
sit idle (stopped) as  you would think  about what to  do next.   
Debugging and programming were both done at the machine level.


As processing speed and  machine costs increased it became  evident that
this  mode  of operation  had to  go.   The  first  operating systems
consisted of a dull  slow object called  an operator and a  satellite
computer on which an input tape  consisting of many separate jobs was
created.   Each  job  on the  input tape  was sequentially  read into
memory, executed and the output presented to an output tape.  If some
abnormal  behavior  was  detected  in  your program,  you  were  also
presented with an  uninspiring  octal dump of  the contents  of
memory. Memory dumps were an appalling debugging technique, even then.
Programming was frequently done in a higher level language so the contents
of most machine locations had little meaning to the programmer. Also the
state of the machine at the time the dump was taken frequently had only 
a casual relationship with the actual bug.


In the late '50s several people ⊗↓Including J. Mc Carthy.←began advocating
time-sharing, or interactive systems, which would allow the programmer
to "play with" the machine as if he were the sole user, but when he was
%2thinking%* about his program, the machine could be servicing some
other programmers needs ⊗↓The hardware and software necessary to support such
behavior is quite interesting, but not relevant to our current discussion.←.
What makes  time-sharing viable  is the tremendous  difference
between human reaction time and machine speeds.  In a period of a few
seconds a well  designed   system can satisfy  simple requests  by
many users.

.SS(Debugging in LISP,,P168:)

When can the  %3CALL%* instruction be  replaced by a %3PUSHJ%*?   The
%3PUSHJ%*  instruction  is  used to  call  a  machine language  function.
Obviously  if  we  are  calling  an  interpreted  function, (it  has
indicator %3EXPR%*  of %3FEXPR%*) %3PUSHJ%*  is the wrong thing  to do.   In this
case  we must pass  control to  %3eval%*, evaluate the  function with the
appropriate arguments,   return the value  in %3AC1%* and finally  return
control to the instruction following the %3CALL%*.  If the function being
called is a  machine language  routine (indicator is  %3SUBR%* or  %3FSUBR%*)
then  we  could  replace   the  %3CALL%*  with  a  %3PUSHJ%*.  This  would
`short-circuit'  the call on  the submonitor  and the calling  of the
function could be  done more quickly.   There  are many occasions  in
which we do not wish to make this replacement, however.

Assume for  the  moment that  I am  mortal and  that my  LISP
program  has some  bugs.   Crucial  pieces of  information  about the
behavior of the program are: which functions are being  entered, what
are  the actual  parameters, and what are the  values being returned.
Assume that  we wish to monitor the behavior of the function, %3FOO%*. We
will hide the  real definition of %3FOO%*  on another symbol table  entry
(using %3gensym[]%*)  and redefine %3FOO%* such  that, when it  is called, it
will:

.BEGIN narrow 10;;

%21.%*  print the values of the current actual parameters.

%22.%*	use %3apply%* to call the real defintion of %3FOO%* with the current parameters.

%23.%*	print the value of the call on %3FOO%*.

%24.%*	return control to the calling program.
.END
This device is called tracing.

Since  %3FOO%*  may  be  recursive  we   should  also  give  some
indication of the  depth of recursion being executed.  Now every call
on %3FOO%* will give us  the pertinent statistics.  Interpreted calls  on
%3FOO%* will  go through %3eval%*, and  if %3(CALL ... FOO)%* is being  used in the
compiled  code  the  submonitor  can  pass  control  to  the  tracing
mechanism; but if the %3CALL%* is replaced by a %3PUSHJ%*, the control passes
directly to the machine language code for %3FOO%* and we cannot intercept
the call.

On most implementations  of LISP the %3PUSHJ-CALL%*  mechanism is
under  the  control   of  the  programmer.    After  the  program  is
sufficiently debugged,  replace  the  %3CALL%*  with the  %3PUSHJ%*  and  the
programs will  execute faster.   But  be warned  that this  action is
irreversible on most machines; once the %3CALL%*s have been overwritten 
they cannot be recovered.

A variant of this tracing scheme can  be used to monitor %3SET%*s
and  %3SETQ%*s in  interpreted %3prog%*s.   Since calls  on %3SET%* and  %3SETQ%* are
interpreted (by %3eval%*  and Co.),  we can modify  their definitions  to
print the  name of the  variable and the  new assignment, do  it, and
return.  This technque again is lost in some compiled code. Since we only
compile local variables as references into the value stack, we have lost both
the names and the ability to trace their behavior.

This is a very brief description of debugging in LISP.  It again shows
some of the power resultant from having an evaluator  available at runtime.
Much more sophisticated debugging techniques
can  be implemented,  particularly  in an  on-line  implementation of
LISP. Perhaps the most comprehensive on-line LISP system is INTERLISP, an
outgrowth of BBN LISP. The details of this undertaking would take us too far
afield from our current discussion.
.SEC(Storage structures and efficiency,,P210:)

Though it is true that any algorithm can be coded in terms of
manipulations of binary trees, there are many instances in which more
efficient organizations of data exist.
At the most elementary  level are vectors and arrays.   These
data structures could  certainly be stored in a list structure format
and individual components accessed via %3car-cdr%* chains.  However, most
machines  have a  hardware  organization which  can  be exploited  to
increase   accessing   efficiency  over   the   list  representation.
Similarly, strings can  be represented as  lists of characters.   The
string processing operations are expressible as LISP algorithms.  But
again this is usually not the most resonable representation. Even  at
the level of list-structure operations, simple binary trees might not
be the most  expeditious representation for every problem.  Also many
of the algorithms we  have presented in  LISP are overly wasteful  of
computation time.  This section  will begin an examination of
alternatives   to  LISP  organization.     

There  is   no  best  data
representation, or no best algorithm.  The chosen representation
must match the machine and  the problem domain being studied.  If
the problem  is strictly numerical then list-structure is not the
answer; if simple string manipulation is sufficient 
then list processing is too general.  And there are many applications
of list processing which are sufficiently well-behaved that 
complex devices like garbage collectors are unncessary.
The point  is that if you have seen the list-processing techniques in
a rich environment  such as  LISP and have  seen the applications  to
which  LISP may  be put,  then you  will be  prepared to  apply these
techniques  in  a  meaningful  way.    Many  times  an  (inefficient)
representation in  LISP  is all  that is  needed.   You  get a  clean
representation  with comprehensible algorithms.   Once you've studied
the algorithms, efficiencies might come to mind. At that  time either
recode the problem using some  of the obscene LISP programming tricks
or drop into machine language if very fine control is required.  Once
you have  %2a%*  representation it  is  easy to  get  better ones.    For
example,  once you have  a compiler  which is   correct  it is
easier to describe a smarter one.
		
This section will describe other representations than LISP binary trees
and will show some ways of increasing the efficiency of LISP programs

.SS(Bit-tables)
Bit tables: In the marking phase of a garbage collector it is
necessary to  record the visitation of each word.
Frequently it is not possible to place a mark in the  actual word.
This  might occur for  several  reasons: 
.BEGIN INDENT 0,6;
1.  for words in  FS, there is  no
room if each  word contains exactly two addresses; or  

2. the word is
in  FWS and  the  meaning of  the information  stored there  would be
changed if we set a bit.

3. in  garbage collectors for more  complicated data structures, marking
with a bit table becomes a necessity.

.END

An alternative solution is to designate a separate section of memory called a
bit-table. This bit-table is thought of as a sequence of binary flags; and there
is a one-to-one correspondence with a flag and a markable location.
Whenever we wish to record the visiting of a word, we set the corresponding flag.
Besides solving the problems  stated above, we have an added benefit:
.BEGIN INDENT 0,6;
Recall that the  garbage  collector must  initialize  all  the
mark-bits to  zero before the actual marking  process begins (look at
the g.c. algorithm).  It is faster  on most machines to zero a  whole
table rather than zero single bits in  separate words.
.end
.SS(Vectors and arrays,,P137:)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
%2Vectors%*:  Vectors (or  one  dimensional arrays)  are  usually
stored sequentially in memory.  Simple vectors are usually stored one
element to a  memory location though  this is not  a necessity.   For
example a complex vector is very  naturally stored as pairs of cells;
or if  perhaps you would allow vectors of non-homogeneous data modes,
each element would  have to include type  information.  In any  case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made.  There is a natural
simulation of  a stack as  a (sequential) vector  with access  to the
stack made via a global pointer to the vector.

%2Arrays%*: Arrays are multi-dimensional data  structures.  Since
most  machine memories are linear  devices we must map  arrays onto a
linear representation.   We will  restrict attention to  the case  of
two-dimensional  arrays.   Most  of the  discussion generalizes  very
naturally.   A very common device is to store the array by rows; that
is,  each  row is stored sequentially,  first, row 1; then  row 2,...
Given this representation there  is a trivial calculation to find the
location of an arbitrary element, A[i;j], if we know the location  of
the first  element, A[1;1] and  the extent of  the dimensions  of the
array.  For an array A[1:M; 1:N]

←loc[A[i;j]] = loc [A[1;1]] + (i-1)*N + (j-1)

In languages like Fortran which require that the size of the array be
known at compile-time  the compiler can generate the accessing code
without problem.  Languages, like Algol 60, and some versions of LISP
which allow the size of an  array to be determined at runtime require
a bit more care.  Algol 60, for example requires that you declare the
type (real, boolean,  etc.) of  the array and  specify the number  of
dimensions  in the  array,  but you  can postpone  until  runtime the
designation of the size of each dimension.  To handle this complexity
a dope vector is  introduced. 
A %2⊗→dope vector↔←%*  is typically a header or descriptor associated with
the area containing the actual array elements. The information in the dope vector
tells the functions  which access the array, how to treat the data.
Type and dimensionality are typical entries in dope vectors.

The compiler can determine  the size of
the dope vector, but  not the contents.  The dope vector is filled in
at runtime and  contains information about the  actual extent of  the
array bounds.   Also since  the size of the  array is not  known, the
compiler  cannot   allocate  space  for  the  array  elements.    The
allocation must be  done at runtime.   When the array declaration  is
executed we must  know all the information about the  array.  At that
time we add the  array-bound information to the  dope vector and  add
information telling where  to find the  array elements.   

Assume that
the  array elements are stored  by rows.  Look  at the calculation of
the location of element, A[i;j].   For specific execution of an  array
declaration much  of this information  is constant; the  location of
the array  elements, in particular, A[1;1] and the number of columns,
N, is fixed.  Thus we rewrite the calculation as:

\constant part\variable part

\ [loc [A[1;1]]-N-1]       +\  	(i*N+j)

The constant  part is stored  in the  dope vector.   When we wish  to
reference  an element A[i;j] we  need only compute  the variable part
and add it to the constant part.

The dope vector for A [1:M; 1:N] perhaps might contain
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		⊂ααααααααααααααα⊃
		~      2        ~
		εααααααπααααααααλ
		~   1  ~    M   ~
		εααααααβααααααααλ
		~   1  ~    N   ~
		εαααααα∀ααααααααλ
		~ constant part ~
		εαααααααααααααααλ
		~    . . .      ~
		 array elements
		~    . . .      ~
		%ααααααααααααααα$
.END


There is another scheme  for storing arrays which is  used in
some of  the Burroughs machines. Each row  is stored sequentially and
access  to  separate  rows  is   made  through  a  device  called   a
%2⊗→mother-vector↔←%*.   The mother vector is  a vector of pointers  to the
rows.  Thus:



.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TURN OFF "\";
.TABS 50,54;
                                                  
	⊂ααααααα⊃    ⊂αααααααααααααααααα     ?αααα⊃
 	~    #ααβααα→~ A∞411∞*  A∞412∞*   ...?A∞41m∞*?~
  	εαααααααλ    %αααααααααααααααααα     ?αααα$
	~    #ααβαα→ ..
	εαααααααλ
	   . . .

	εαααααααλ    ⊂αααααααααααααααααα     ?αααα⊃
 	~    #ααβααα→~ A∞4n1∞*  A∞4n2∞*   ...?A∞4nm∞*?~
  	%ααααααα$    %αααααααααααααααααα     ?αααα$
.END

Notice that the accessing computation is very  cheap⊗↓However access
by array columns can be expensive. If each row is on a separate
page in the machine, the access overhead can be substantial.←.  Another effect
is  that all rows  need not  be in memory  at once.   On a  paging or
segmenting machine (we  will talk  about machine organization  later)
this array organization can be helpful.  If an access to a row not in
core  is made, a `page  fault' is raised; the  monitor brings the row
into memory and the computation continues.   The mother-vector scheme
generalizes  nicely  to   multidimensionality  and  can  be  used  in
conjunction with a dope vector.

.END
A typical implementation on an array facility in LISP would include
a declaration:

.BEGIN INDENT 0,10;
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... <bounds>], where the
identifier names the array; the type could be numeric or S-expr; and  finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus:
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
  ⊂ααπαααα⊃   ⊂ααααααπαααα⊃ ⊂αααπααα⊃
  ~∃ ~  #αβαα→~ SUBR ~  #αβ→~ # ~ #αβ→ . . .
  %αα∀αααα$   %αααααα∀αααα$ %αβα∀ααα$  
 			      ~				 
			      ~    ⊂ααααααααααααα⊃
			      %ααα→~ dope vector ~
				   ~ calculation ~
				   εαααααααααααααλ
				   ~    array    ~
				   ~  elements   ~
					. . .
				   %ααααααααααααα$
.END
.BEGIN INDENT 10,10;
If we are to store S-exprs in the array then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.

When an array element is to be referenced, then the subscripts are evaluated
(recall that the array name was declared as a %3SUBR%*) and the dope vector code
is executed, resulting in a reference to the appropriate cell.
.END

We also must be able to store information in the array.
.BEGIN INDENT 0,10;
%3store[%1<name>[<subscr>; ... <subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.END
.SS(strings and linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
On {yon(P216)} we discussed one representation for LISP print names:
a linked list of full words, each containing part of the external
name for the atom. Print names are a special instance of a data structure
named strings, and our use so far in LISP of strings has been restricted to
manipulating string constants. In this section we will discuss alternative
representations for strings, and discuss components of a string-manipulating
language. 
The
organization  and  implementation  of   a  general  string  processor
directly parallels LISP.  In fact a subset of LISP,
⊗→linear LISP↔←,  is a nice notation for string algorithms.

A string is a sequence of characters.   A language
for string processing should allow the creation of strings
of arbitrary length at runtime; it should allow the generation of new
strings and the  decomposition of existing  strings.  If  strings of
arbitrary length are  to be created, an organization similar to FS in
LISP can be used with, perhaps, a string garbage collector.   We will
assume this most general case.
The machine memory will  contain at least a ⊗→string  space↔←, an
evaluator, a symbol table, and a garbage collector.

String space is a linear sequence of cells, each of which can
contain exactly  one character.   A string  will be  represented as  a
sequence of contiguous character cells.  A string variable will be an
entry in  the symbol table; the current value of the variable will be
represented as a pair: character count and a pointer to the beginning
of the character sequence.

Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
	⊂αααπααα⊃
	~ 3 ~ # ~
	%ααα∀αβα$
	      ~
	      %α→ααααααααα→ααα⊃
			      ↓
		    . . .ααααπαααπαααπαααπαααπαααπααα . . .
			       A   B   B   1   D   . . .    string space
		    . . .αααα∀ααα∀ααα∀ααα∀ααα∀ααα∀ααα . . .
.END
encodes the string %3ABB%*.

.BEGIN TURN OFF"←";
We must make some decisions about how  we manipulate strings: when we
perform %3x ← y%*, do we copy the symbol table pair of %3y%* into that of %3x%*, or
do we make a new string isomorphic to %3y%* and point %3x%* at it.   It makes
a  difference.    We  will   choose  the  former,  copying  only  the
`descriptor',  that  is,  we  will  share  strings  (and  substrings)
wherever possible. This decision makes the  storage requirements less
expensive, but will make our  life more difficult when we worry about
⊗→garbage collection↔←.   There are  three  primitive functions: ⊗→%3first%*↔←,
⊗→%3rest%*↔←,  ⊗→%3concat%*↔←  (read: %3car%*,  %3cdr%*,   %*cons%*).
.END

.BEGIN INDENT 0,10;
%3first[x]%*  is  the first
character of the string represented by %3x%*.  %3first%* is undefined for the
empty string. For example:
.END

←%3first[ABC]%1 is %3A; first[%1∧%3]%* is undefined.

%3rest[x]%* is  the string  of characters  which remains  when the  first
character of  the string is deleted.  %3rest%* is also undefined for the
empty string. For example:

←%3rest[ABC]%* is %3BC%*

.BEGIN INDENT 0,10;
%3concat[x;y]%* is a function of two arguments.   %3x%* is a character; %3y%* is
a  string. %3concat%* forms  a string  consisting of the  concatenation a
copy of %3x%* and a copy of the string, %3y%*.  For example:
.END

←%3concat[A;BC]%* is %3ABC%*
	
There are three string predicates:

.BEGIN CENTERIT;
←⊗→%3char%*↔←%3[x]%1:  is %3x%* a single character?
←⊗→%3null%*↔←%3[x]%1:  is %3x%* the empty string?
←%3x ⊗→%3=%*↔← y%1:  are %3x%* and %3y%* the same character?

For example:

←%3char[A] %1is true
←%3char[AB] %1is false
←%3AB = AB %1is undefined
.END

	Now to implementations:
%3first%* generates a character  count of 1 and a  pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.

%3concat%* is a  bit more  problematic.   We copy %3x%*  and copy  %3y%*,
generate  a character  count of  the sum  of those  of %3x%*  and %3y%*,  and
generate a pointer to the character of the copy of %3x%*.  The copies are
made in the free string space pointed to by the %2⊗→string space pointer↔←%*.
The obvious  question of  what to do  when string space  is exhausted
will be postponed  for a moment.   The implementations  of the  three
predicates are easy: we will blur  the distinction between characters
and strings  of length 1.  Thus %3char%*  need check the character count.
%3null%* says character count is 0.  What about = ?  Obviously characters
are  not  stored   uniquely,  so  we must  make  an  actual character
comparison.

Now  ⊗→garbage collection↔←.    In  some  ways a  string  garbage
collector is simpler and in some ways more difficult than a collector
of list-structure.   The  marking phase  is much  simpler: using  the
descriptor in the symbol table, mark  the character string.  Since we
are sharing  substrings, we cannot stop the marking simply because we
have encountered a previously  marked character. 
But at least  the marking is not recursive.   However, the collection
phase needs to be  more sophisticated for  string processors.   Since
strings are  stored linearly (or  sequentially), a  fragmented string
space  list  is  of  little  use.    Thus  we  must compact  all  the
referenceable strings into one end of string space, freeing  a linear
block for the new free string space.  Since we are sharing substrings
a  little care  needs  to  be exercised.    When  we move  a  string,
obviously the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location.  So before
we begin the relocation of strings  we sort the string descriptors on
the  basis of their  pointers into  string space.   We then recognize
each parent string, moving it down into freed  locations and updating
the address  pointers of any  substrings.  We  continue this process.
Eventually all strings will  be compacted down  in string space.  The
free space  pointer will  be set  and the  computation can  continue.
Compacting garbage collectors  can be adapted for use in LISP or more
general types of data structures.

.END
.CENT(Compacting GC for LISP)
.P173:
We can combine the simplicity of the original mark-sweep garbage collector
with the sophistication of the sweep phase of string collector and 
develop a subtle compacting garbage collector for LISP.

The intention is to move all active structures to contiguous locations at 
one end of the appropriate space. 
There are several motivations for compacting storage. First, besides making the
%2active%* storage contiguous, we also make the %2free%* locations contiguous.
Thus the free lists can be handled as arrays rather than as lists. To get the next
free element, take the next element in the free array. 

The second reason for considering  compaction is comes up in languages other
than LISP⊗↓As we shall soon see, the rationale is
applicable in LISP as well.←. If the language allocates storage in a manner similar to LISP
but the constructs allow %2different%* sized blocks to be specified
⊗↓a string processor is a simple example←, then
compaction is a strong contender. To subdue the fragmentation of memory
we could consider compacting all free locations to one end of memory.
More general schemes are also viable. We will talk about these possibilities
when we discuss dynamic allocation.

Another reason for 
concern with compacting is related to hardware. If the underlying machine is
using a paging scheme, then we can try to minimize page-faults by keeping
the LISP structures localized. In the worst case, we could have every element
of a list on a separate page; this would obviously increase machine overhead.
We will say a bit⊗↓sorry← about hardware organization and its effect on language
implementation later. However,
we cannot restrict the operations of the LISP programmer. The underlying hardware
%2must%* be invisible to the user. The next best thing is to try to keep
the structures as local as possible; compaction of spaces is a first attempt
at this. We will discuss other lower-level tricks later.

So granted that compaction is a worthwhile endeavor, how do we proceed?
Clearly we can't simply mark all the
active cells and then move them into unmarked cells to compact the space.
We must also maintain the original topological relationships between the elements.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 40;
	   ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
      204  ~204~204~∞1 is not the same as:∞b   ?100  ~204~204~     
	   %ααα∀ααα$     ?     %ααα∀ααα$                
.END
                                                 
Besides moving the cells, we must also update each reference to a moved location:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 40;
	   ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
      204  ~204~204~∞2 is∞1 the same as:∞b   ?100  ~100~100~     
	   %ααα∀ααα$     ?     %ααα∀ααα$                
.END

To handle these problems, we expand the sweep phase into two phases:
the %2relocate%* phase and the %2update%* phase. As with the sweep phase,
there are relocate and update phases for %2each%* space.

The relocate phase begins after all active structure is marked.
Assuming we are to compact all active structure %2down%* to the
bottom of the space. First we initialize two pointers: a %2free%* pointer
to the lowest cell in the space; and an %2active%* pointer to the top of the space.
We move the active pointer down until we come across a marked location;
we move the free pointer up until we locate an %2unmarked%* cell. 
We we want to do is move that marked cell down into the free location; but
we must also supply enough information to maintain the original relationships
in the transformed structure. The cell we move  may reference other cells
which will be moved.  

.BEGIN GROUP;
Here's a picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~   ~   ~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~   ~   ~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      204  ~402~ 77~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~204~402~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
		     . . .
.END
.END

Cell %b77%* was active so we left it alone; it references cell %b65%*,
which  has already been visited; and also references cell %b402%* which is about
to move. We move the contents of cell %b402%* into cell %b100%*, and to
let everyone know where it has gone, we leave a forwarding address of
%b100%* in location %b402%*.
.BEGIN GROUP;
Thus:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		   ⊂αααπααα⊃     
	      100  ~204~402~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃ 
	      402  ~    100~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
		     . . .
.END
.END
The active pointer, having writ, moves on; 
it skips over any unmarked cells, looking for the next marked
location. Assume the next marked
location is %b204%*. It stops there and waits for the free pointer
to discover that location %b155%* is the next free location.
In its search the free pointer will  skip over any marked cells.
The same relocation operation occurs: the contents of %b204%* is moved
to location %b155%*, and the forwarding address of %b155%* is placed in location
%b204%*. The process continues until the two pointers collide.
Call that collision point %acol%*.
When they  meet, all locations above %acol%* will either be free or will
contain forwarding addresses. All addresses, %acol%* and below, will contain
marked words or relocated cells.
We are now ready to enter the %2update%* phase.

.BEGIN GROUP;
Here is the picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~204~402~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~402~ 77~ 
		   %ααα∀ααα$                
                     . . .   ← ∞acol∞*
		   ⊂αααπααα⊃     
	      204  ~   ~155~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~   ~100~ 
		   %ααα∀ααα$ 
		     . . .
.END
.END
We examine the initial segment of our space from the bottom to %acol%*
looking for any references to that area %2above%* %acol%*. A reference
to that area must be changed. What is found in the referenced cell is not
the desired information, but is the forwarding address of the desired information.
It is obvious what to do: tell the sender what the change of address is.
Thus the %3cdr%*-part of cell %b77%* becomes %b100%*; the %3car%*-part doesn't 
change. Cell %b100%* refers to two relocated cells; we find their forwarding 
addresses, and cell %b100%* becomes:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		   ⊂αααπααα⊃     
	      100  ~155~100~ 
		   %ααα∀ααα$                
.END
Similar treatment is given to cell %b155%*, modifying the %3car%*-part.
When all cell below %acol%* are updated, the garbage collection is finished.
The cells above %acol%* are all available for the free-list.

.CENT(Problem)
%21.%* Is %acol%* in the free space list after the update phase?
.SS(%3rplaca%* and %3rplacd%*,,P171:)

.BEGIN TURN ON "←";
We  will  first look  at  some LISP  coding
tricks which when used judiciously, can increase efficiency, but when
used in a cavalier manner can result in mystery.  First, LISP does an
awful lot of copying.  Consider:

%3←append <= λ[[x;y][null[x] → y;T → concat[first[x];append[rest[x];y]]]]%*

This function copies %3x%* onto the front of %3y%*. Note: %3y%* is not copied.
Or recall the %3subst%* function: it generates  a copy with the correct
substitutions made.   The copying is necessary  in general. It  keeps
unsavory side effects from happening.

Let's look at %3append[(A B C);(D E F)]%*.  It appears that we
could get the appropriate effect of %3append%* by %3rest%*-ing down the list
%3(A B C)%* until we found the terminator, then replace it with a pointer
to the list %3(D E F)%*.  Thus:


.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃   ⊂αααπααα⊃   ⊂αααπαα⊃     ⊂αααπααα⊃   ⊂αααπααα⊃   ⊂αααπαα⊃
~ A ~ #αβαα→~ B ~ #αβαα→~ C ~≤'. . .→~ D ~ #αβαα→~ E ~ #αβαα→~ F ~≤'~
%ααα∀ααα$   %ααα∀ααα$   %ααα∀αα$     %ααα∀ααα$   %ααα∀ααα$   %ααα∀αα$

.END

What's wrong here?  Consider the sequence of statements:

.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;

\i ← (A,B,C)

\j ← (D,E,F)

\k ← append[i;j]
.END

Then if %3append%* worked  as advertised above, (changing the  %3rest%* of the
last element of %3i%*) the following evil would be perpetrated: the value
of  %3i%*  would be  changed  surreptitiously.   %3i%*  now  has   the  value
%3(A,B,C,D,E,F)%*.   Language features which  do this  are evil.   It is,
however,  quite useful to be  evil sometimes.   Notice that any value
which was sharing  part of the structure  of %3i%* will also  be changed.
This can cause real mystery.  Well the world is good, and %3append%* does
not work as above.   The LISP  function which %2does%*  work this way  is
called %3nconc%*.   It  can be  defined in  terms of  a primitive  obscene
function, %3rplacd%*.  There are two primitive obscene functions:


.BEGIN GROUP;
The first is ⊗→%3rplaca%*↔←%3[x;y]%*: replace the %3car%*-part of %3x%* with %3y%*.
For example:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		      AC∞41 ∞*		    AC∞41 ∞*		   
		   ⊂ααααααα⊃		  ⊂ααααααα⊃		    
		   ~   102 ~	          ~    102~
		   %ααααααα$	          %ααααααα$
		      AC∞42 ∞*
		   ⊂ααααααα⊃
		   ~   204 ~	
		   %ααααααα$
			     ∞1==∞3rplaca∞1=>∞b
		     . . .
		   ⊂αααπααα⊃     	 ⊂αααπααα⊃     
	      102  ~222~472~        102  ~204~472~     
		   %ααα∀ααα$             %ααα∀ααα$                

.BEGIN CENTER;SELECT 2;
Example of  ∞3rplaca∞1
.END
.END
.END



.BEGIN GROUP;
The second function is
⊗→%3rplacd%*↔←%3[x;y]%* meaning replace the %3cdr%*-part of %3x%* with %3y%*.
Here's an AMBIT/G ({yon(P158)}) picture for this function.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
		      AC∞41 ∞*		    AC∞42 ∞*		   
		   ⊂ααααααα⊃		  ⊂ααααααα⊃		    
		   ~     # ~	          ~     # ~
		   %αααααβα$	          %αααααβα$
			 ~			~
			 ↓			↓
		   ⊂αααπαααα⊃    	  ⊂ααααααα⊃     
                   ~ ? ~ ? .~. → . . . .→ ~   ?   ~     
		   %ααα∀αααα$             %ααααααα$                
.BEGIN CENTER;SELECT 2;
Algorithm for  ∞3rplacd∞1
.END
.END
.END


.BEGIN GROUP
Thus %3nconc%* can be defined as⊗↓Since we're really grubbing with the representation
we use %3car%* and %3cdr%* rather than %3first%* and %3rest%*.←:
.BEGIN SELECT 3;TABIT1(20);TURN OFF"←";

nconc <= λ[[x;y]prog\[[z]
\ [null[x] → return[y]];
\ z ← x;
\a[null[cdr[z]] → rplacd[z;y];return [x]];
\ z ←cdr [z];
\ go[a] ]]

.END
.END

.BEGIN SELECT 3;TABIT1(30);TURN OFF"←";GROUP;
%1Consider:%*

\x ← (NOTHING CAN GO WRONG);
\rplacd[cdddr[x];cddr[x]];
\print[x];

.END

So we can use %3rplacd%* to generate circular lists (and to generate wall
paper  by printing  them!!).
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In  general,  to circularize  a non-empty
list, %3x%*, %3rplacd[last[x];x]%* suffices where:

←%3last <=λ[[x][null[cdr[x]] → x; T → last[cdr[x]]]]%*

Functions which modify list structure must be  used with  extreme caution.   
They are  not
recommended for amateur hacking.  They are introduced here simply to
show that  it  is  possible to  improve  on the  efficiency  of  pure
algorithms by resorting to these coding tricks.

.BEGIN GROUP
←%2Problems%*

%21.%1  What is the effect of evaluating %3rplacd[x;cdr[x]]%*?

%22.%1 Recall the problem on ⊗→hash consing↔← on {yon(P154)}.
There we were contemplating unique storage for all S-exprs.
Can such a scheme be reconciled (efficiently) with functions like
%3rplaca%* and %3rplacd%*?
.END
.END


.SS(Applications of %3rplaca%* and %3rplacd%*,,P155:)
.BEGIN TURN ON "←";
We begin with rather simple examples. Consider the  problem of inserting
an element into the middle of a list.  For example let %3x%* be the list,
%3(A B C)%*.  If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform:

.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]].
.END

Indeed, in general, we have little choice but to recopy the the initial segment
of %3x%*, adding %3D%* into the appropriate place.  A similar technique is
obviously available to delete a specified element from the interior of a list.

Careful use of %3rplacd%* can, in some instances, replace the heavy use of %3cons%*.
%3cons%* always carries with it the threat of a call on the garbage collector.

For example, given the list %3(A B C)%* with pointers, %3x%* and %3y%*, into it
as follows,

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.END


we could insert the element, %3D%*, %2after%* the first element in %3y%* by:

←%3rplacd[y;cons[D;cdr[y]]%*, giving⊗↓Notice 
that %2one%* %3cons%* is unavoidable.←:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃        ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβα⊃    ⊂→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$ ↓ 	  ↑ %ααα∀αα$
			     ↓    ~
		        ⊂αααπααα⊃ ↑
			~ D ~ #αβα$
			%ααα∀ααα$

.END

But as always, be warned that the value of %3x%* has also been changed; and
any S-expr sharing the list %3x%* or %3y%* as a sublist has also been affected.

We can also use %3rplacd%* to delete not the %2first%*, but the next element,
in %3y%* by:

←%3rplacd[y;cddr[y]].%*

Similarly, we can use %3rplaca%* to modify an element in a list (or S-expr).
To change the first element in the list, %3y%*, to the S-expr, %3z%* use

←%3rplaca[y;z]%*.

Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %2after%* and delete %2after%*, rather than insert %2at%* or delete %2at%*.
If you look at a diagram you will see why:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
x	       
~	       
↓   ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.END

To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell.  Similarly to insert at a specified cell. How could we write such modifying
functions?  A simple, perhaps inefficient scheme, would be to start another pointer
 from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.  

If these `modification-%2at%*' functions were to be performed very frequently
then it might be worth starting %2two%* pointers down the list, one at %3x%*,
one say %3y%* at %3cdr[x]%*, as above.  Then %2testing%* could be done using %3y%*
and the %2modification%* could be done using %3x%*. When we move %3y%* to
%3cdr[y]%* we move %3x%* to %3cdr[x]%*.  What happens if we wanted to modify
%2before%* rather that %2at%*? We could proliferate the `back pointers', but
perhaps if this kind of generality is required a change of representation
is called for.  We might resort to the double-linking scheme introduced on 
{yon(P164)}; more complex reprepresentations are also discussed
in detail in Knuth's Kompendium, Khapter 2.

The LISP system uses %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists use these functions. Here are the
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.

.BEGIN INDENT 0,10;

%3putprop%* was introduced on {yon(P52)}.
Recall that the effect of %3putprop%* is to attach an indicator-value pair
to an atom. If the indicator is already present then we will simply change its 
value;  
if the indicator is not present then we will add the indicator-value 
pair to the front of the p-list. 
In the definition %3n%* is an atom, %3i%* is an indicator, and %3v%* is 
the value to be stored.
.END

.BEGIN SELECT 3;TABIT2(5,8);GROUP; TURN OFF"←";

putprop <= λ[[n;i;v] 
   prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → rplaca[cdr[m];v];return[v]]
\\m ← cddr[m];
\\[null[m] → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]  ]
\\go[a] 
\\]]
.END
Note that extended conditional expressions are used in the definition.

.BEGIN INDENT 0,10;
%3remprop%* was also introduced on {yon(P52)}.
%3remprop%* is a predicate, used to remove attribute-value pairs from
the property list of an atom.
.END

.BEGIN SELECT 3;TABIT2(5,8);GROUP;TURN OFF"←";

remprop <= λ[[n;i]
   prog[[m]
\\m ← n;
\a\[eq[cadr[m;i] → rplacd[m;cddr[m]];return[T]]
\\m ← cddr[m];
\\[null[m] → return[NIL]]
\\go[a]
\\]]

.END


Applications of %3rplacd%* will occur inside ⊗→%3ratom%*↔← and
on {yon(P160)}
we will develop a version of LISP's parser which uses pointer modification to
gain efficiency.
Pointer modification is also used
inside the ⊗→garbage collector↔←.  For example the ⊗→sweep phase↔← might
be  described as:

.BEGIN SELECT 3; GROUP;TURN OFF"←"; TABIT2(22,25);

sweep <= λ[[x;y]prog\[[z]
\\z ← NIL;
\a\[nap[x] → z ← rplacd[x;z]]
\\x ← down[x];
\\[eq[x;y] → return[z]]
\\go[a] ]]

.END

Where %3sweep%* is to sweep from %3x%* to %3y%*.  %3nap%* is a predicate
returning %3T%* just in the case that its argument is still marked `not available';
and %3down%* finds the `successor' of its argument in the linear ordering of the
memory cells.

.P161:
Finally, one of LISP's less illustrious uses of pointer modification
is to "allow" the construction of self-modifying programs.
This technique is analogous to the hardware tricks of self-modifying code
and should be used with similar frequency.
The freedom to hang yourself should not be construed as an invitation, but
it again points out the similarities of LISP with machine language and
highlights the differences with LISP's contemporaries.

LISP's  central processor %3eval%* operates by traversing and interpreting
the data structure representation of the program; that data structure is also
open for inspection by LISP's data structure manipulating functions.
Since we now have list-modifying functions, we could concievable modify a program
by changing its internal structure. Indeed we can write a program which
modifies its %2own%* structure. 

.BEGIN TABIT2(13,16);GROUP;TURN OFF "←";
Here's one:
.SELECT 3;
foo <= λ[[x] prog[[y;z]
\\z←1;
\\y←sixth[body[foo]]
\a\print[x]
\\rplaca[rest[y];z←add1[z]]
\\go[a] 
\\]]

.END
The mystery created by %3y%* is a pointer into the representation of the
statement %3print[x]%*; that representation is %3(PRINT#X)%*. Therefore
the effect of the first %3rplaca%* is to change %3(PRINT#X)%* to %3(PRINT#2)%*.
Subsequent passes through the loop will change the statement to print %33, 4, ...%*.
There really isn't much that can be said about such a program.

.CENT(Problems)
I. More on %3ratom%*.
Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.

II. Recall the function which reverses the top-level elements of a list.
On {yon(P16)} and {yon(P23)}
we wrote it in various styles and with varying degrees of efficiency.
All of these functions used %3cons%*; however it is clear that we should be
able to reverse a list without using %2any%* new cells.
Express this
algorithm as a LISP function. If you use %3prog%* don't use any %3prog%*-variables.

.END
.SS(Numbers,numbers)

In most implementations of LISP, numbers are stored  as very simple kinds of
atoms:  they do not  need print names,  but since  we should probably
allow fixed and floating point representation, we do  need indicators
for these properties.  Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

fixed-point 1
~
↓
~  ⊂αααπααα⊃   ⊂ααααααααπααα⊃   ⊂ααααα⊃ 
%→ ~ ∃ ~ #αβαα→~ FIXNUM ~ #αβαα→~  1  ~
   %ααα∀ααα$   %αααααααα∀ααα$   %ααααα$

floating-point 1
~
↓
~  ⊂αααπααα⊃   ⊂ααααααααπααα⊃   ⊂αααααααααααααα⊃ 
%→ ~ ∃ ~ #αβαα→~ FLONUM ~ #αβαα→~ 201400000000 ~
   %ααα∀ααα$   %αααααααα∀ααα$   %αααααααααααααα$

.END
Notice that each actual number is stored in FWS.  This representation
is a  bit expensive.  On  some machines we can use  a coding trick to
improve representation of some  numbers.  Assume that the  addressing
space of the machine  is 2%818%* and that the usual  size of a LISP
core image is significantly smaller, say, N.  Then all memory address
references greater than N  are illegal (and trapped by  the monitor).
What we  will do is use  these illegal addresses to encode  some of the
smaller positive and  negative integers, mapping  zero on the  middle
address, the  positive numbers to  lower addresses and  the negatives
onto  the  higher   addresses.    Thus  these  smaller  integers, called
%2⊗→inums↔←%*,  are
represented by pointers outside of the normal LISP  addressing space.
This  trick can  considerably decrease  the storage  requirements for
jobs  heavily using small  numbers.  (Numbers are  not usually stored
uniquely).

.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 20;

	⊂αααααααααα⊃ 2∞818∞*
	~?~
	~?~
	~?~ m ∞1<∞* 0
	~?~
	~?~ m = 0
	~?~
	~?~ m ∞1>∞* 0
	~?~
	~?~
	εααααααααααλ N
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	%αααααααααα$ 0



.BEGIN CENTERIT;
∞2←Picture of INUM Space∞*
.END
.END
.APART

Most  numerically oriented  programs  are  faced  at some  time  with
overflow.  That is, they attempt  to construct a number  which is too
large to  be represented  in one  machine location.    There are  now
several   versions   of   LISP  which   will   automatically   change
representation  when faced  with overflow.   This  new representation
called Bignums, could have the following structure:


.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

positive big number
~
↓
~  ⊂αααπααα⊃   ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ ∃ ~ #αβαα→~ POSNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %ααα∀ααα$   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
				  ↓		   ↓
				  N∞40∞*                N∞4n∞*

negative big number
~
↓
~  ⊂αααπααα⊃   ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ ∃ ~ #αβαα→~ NEGNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %ααα∀ααα$   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
				  ↓		   ↓
				  N∞40∞*                N∞4n∞*


.BEGIN TURN ON "←";
←∞2Structure of a BIGNUM∞*
.END
.END


.BEGIN GROUP;
The value of Bignum is given by:
.BEGIN CENTER
%9β%1(N%40%* + %9α%1N%41%*+ ... + %9α%8n%1N%4n%1)
.END

where %9β%1 is either + or  - and 
%9α%1-1 is  the largest number  representable in one  machine word.
The intertranslations  between Inums, Fixnums or Flonums, and Bignums
is done automatically within  %3eval%* and Co.
.END
.SS(Stacks and Threading)
Though  recursive descriptions of algorithms are usually more illuminating than
the typical machine-oriented counterparts, it is frequently more efficient
to encode those algorithms in manners which can take advantage of contemporary
hardware. This section will discuss two techniques which "unwind" the recursion
and typically lead to faster execution.

Recall the marking phase of a garbage collector in {yonss(P146)}. There
we wrote %3mark%* as a recursive algorithm. We could equally well write %3mark%*
using  an explicit stack:
.BEGIN TABIT2(6,12);SELECT 3;GROUP;TURN OFF "←";
mark <= λ[[tr]prog[st]
\loop\[is_marked[tr] → go[chk_st]
\\ is_full_wd[tr]→ mark[tr];go[chk_st]
\\ is_free_wd[tr]→ st←push[cdr[tr];st];mark[tr]; tr←car[tr];go[loop];
\\ %Et%* → go[chk_st]]
\chk_st\[null[st] → return[%et%*]]
\\tr←top[st];
\\st←pop[st];
\\go[loop]
\]]
.APART;
.group;
.CENTERIT;
←push <= λ[[i;st]concat[i;st]]
←top <= λ[[st]first[st]]
←pop <= λ[[rest[st]]
.END

Notice that we only save the %3cdr%*-node in the stack; even at that the
stack grows proportionally to the depth of the tree being traversed. 
See the problem on {yon(P162)}.
The technique of using an open stack sometimes is more intuitive and 
sometimes will lead to faster execution.

The second technique is more  tricky but will lead to significant pay-offs
in execution time⊗↓with a proportional loss in clarity of code←.
The technique is called %2⊗→threading↔←%*.
The basis for threading is a desire to traverse tree-stuctures in a more
efficient fashion than that typically available in recursion or via
stacks. Recall that on {yon(P164)} we surmised 
that %2⊗→double-linking↔←%* might be
advantageous in moving up and down the "spine" of a tree structure.
Double links would allow us to find the successors and predecessors of nodes
with equal verve. However the extra link gives us no relief we we wish
to go down into the substructure. It is this area to which threading addresses 
itself: descent into tree structure, and in fact, non-recursive tree traversal.
Threading  comes in various flavors; we will now discuss a few.

Look at the new %3mark%*-er above; you should notice that for a
%2fixed%* tree and a %2fixed%* order of traversal⊗↓say, %3car%* then %3cdr%*
for lack of imagination←, that any two applications of marking will
have the same pattern of behavior. The order of visitiation to each node
will obviously be the same, but the dynamic changes in the state  of the stack
will %2also%* be the same. Instead of replicating the portion of the stack,
it might be possible to store the stack information in the structure itself.
This technique of hiding the control structure in the data structure is called
threading. 
The general application of threading is of questionable utility. It typically
requires a more complex data structure: we must be able to store  both
threads and links. The traversing algorithms also become more complex:
we obviously must be able to recognize the difference between 
control threads and data links. Care must also be taken if we wish to share
threaded list structure. See Knuth's Kompendium for a complete discussion
of the techniques and tricks.

We are not about to complicate the LISP structures, but dispensing with a 
stack, be it implicit or explict, %2does%* have some merits. What we %2can%*
do in LISP is  strike a compromise. Instead of storing the threads permanently
in the structure, there are significant applications of threading
where we can %2temporarily%* store threads as we traverse trees. 
The first application is in the design of a non-recursive %3read%* program;
the second application we will describe is in the mark phase of a garbage
collector.
.CENT(Problems)
.P162:
With a little
more testing before stacking we can significantly cut down  the
number of %3push%*es we have to do. Namely, if some of the branches point
immediately to atoms we might as well makes them at that time and 
proceed without doing a stack operation. Only when both branches are "non-atomic"
do we need stack the %3cdr%*. Write such an algorithm.
.CENT(A non-recursive %3read%*)
.P160:
The orginal %3read%* algorithm of {yonss(P13)} is a good example of
a clear recursive algorithm;
it is reasonably straightforward to follow the flow of the algorithm.

However, now that we understand what the run-time behavior of such a recursive
program is, we can see that %3read%* and friends are a drain on %2two%*
resouces: they use free-space to %3cons%*-up the S-expr representation of the input;
they use  the run-time stack for  handling  the implementation of the
recursion and for saving parameters and intermediate computations. A deeply
nested expression will use a lot of the run-time stack.
Clearly, there is nothing we can do about the drain of the free-space lists⊗↓Yes
lists, plural; we probably will be drawing on the full word area for print name
storage.←, but we %2can%* dispense with the run-time stack by threading.
We can in fact do so without a proportional increase in the use of cells in
free space; indeed we need only %2one%* additional free cell, regardless
of the complexity of the input! This is truly a win for efficiency; it will be
a big loss for clarity. But again that's why this section on storage and efficiency
is where it is; we have an understanding of the purpose and intent of %3read%*;
only after the basic algorithm is well understood should we attempt to be
clever and efficient.
First we describe the basic ideas of the algorithm; them we give the algorithm;
finally  we supply an epilogue.

The main idea  in the algorithm is the realization that we really can determine
the storage requirements for a complex S-expr or list structure as we read it in.
For example consider %3(A#(B#C)#D)%*. As we start our left-to-right scan of the input
we see %3(%*. This immediately tells us that we need at least one %3cons%*.
We read %3A%*; that tells us what the %3car%* of the expression is. Notice that
we don't yet know whether the expression is "dotted" or "listed", but the
storage requirements will be the same. On reading the next open parenthesis we
know we need to add a new level in the  developing representation. The
%3B%* and %3C%* add elements to that level, and the closing parenthesis finishes
it off. The closing parenthesis also should signal our parser to return to the
prior level and continue scanning the input. The %3D%* goes on that
level and the final closing parenthesis completes the input.
All this is old but the difference now is that we don't use recursion or an explicit
stack to keep track of where we are. We keep a thread in the %3cdr%*-part of 
the last cell on every level. When we go down a level we manufacture a new
cell with the %3cdr%* pointing to the cell we just came from in the previous
level; this happens when we see a left parenthesis. We go up a level when we see
a right parenthesis; that is done by following up the thread in the current level,
after doing appropriate cleanup. 

There are three basic states in the reader: 
.BEGIN INDENT 5,5,5;group;
%21.%*  The next input should go into the %3car%*-part of the current cell.
This state is entered when we go down a level. It is labeled %3head%* in the
following program.

%22.%*  The next input should go on the current level. This is the typical state
in the building of a list-input. Here we add a new cell in the current level 
and put the input in the %3car%*-part of that cell; then stay in this state.
This state corresponds to label %3tail%*.

%23.%*  The other main state occurs on reading a dot when in %3tail%*-state⊗↓dots 
seen in  any other context are errors←. In dot-state we check the next input;
if it's an atom we stuff it on the thread and go up. If it's a  left parenthesis
we add  a new cell and go down.
.END
There are some anomalies in the algorithm do to recognition of both S-exprs
and list-notation. The main malefactor is a count of parenthesis; it increments for
left parentheses and decrements for right parentheses. A legal input has been
recognized when we are back at the top level and the count is zero.

The final difference between the old parser and the new one involves
the scanner, %3ratom%*. We assume a new %3ratom%* which reads %3()%* and
returns %3NIL%*. This is not too strenuous an assumption. If scanner
sees an open parenthesis, it looks ahead to the next meaningful character⊗↓ignoring
spaces and the like←. If the character is a closing parenthesis, it takes it;
if the character is not, it is left for the next call on %3ratom%* and
%3ratom%* returns with an indication that it has seen a left parenthesis.
With this introduction, here is the new %3read%*:

.BEGIN SELECT 3; TABIT3(6,19,34);TURN OFF "←";GROUP;

read <= λ[[]prog[[j;cp;count;top;temp]
\count←init[]; cp←count;top←cp;
head\j←ratom[]
\[or[is_dot[j];is_rpar[j]] → err[];
\ is_lpar[j] →\incr[count];
\\cp←down[cp];
\\go[head];
\ atom[j] → stuff[cp;j];go[ckend]]
tail\j←ratom[]
\[atom[j] →\cp←insert_move[cp;j];go[ckend]
\ is_rpar[j] →\decr[count];
\\[eq[top;cp] → go[ck1]]
\\ %et%* →  cp←stuff_up[cp;NIL];go[ckend]
\is_lpar[j] →\incr[count];
\\cp←down[insert_move[cp;NIL]];
\\go[head];
\is_dot[j] →\j←ratom[];
\\[or[is_dot[j];is_rpar[j]] → err[]
\\ is_lpar[j] →\incr[count]
\\\cp←insert_move[cp;NIL]
\\\go[head];
\\ atom[j] →\cp←stuff_up[cp;j]
\\\go[ckend]]]
ckend\[eq[cp;top] → go[ck1]
\ %et%* → go[tail]]
ck1\temp← cnt[top]
end2\[zerop[temp] → return[exp[top]]
\j←ratom[]
\[is_rpar[j] → temp←sub1[temp]; go[end2];
\ %et%* → err[] ]
]]

.APART;
.GROUP
.TURN ON  "∞" FOR "←"; NOFILL;
∞init <= λ[[]cons[NIL;0]]
∞stuff <= λ[[x;y]rplaca[x;y]]
∞incr <= λ[[z]rplacd[z[add1[cdr[z]]]]
∞insert_move <= λ[[cp;val]rplacd[cp;cons[val;cdr[cp]]];cdr[cp]]⊗↓%1We are
using an extended form of λ-notation: %3λ[[x%41%*;#...#x%4n%*]%9x%41%3;#...%9x%4m%3]%1
 where we evaluate the %9x%1's from left to right, returning the value of %9x%4m%1
as the value of the λ-expression.%3←
∞down <= λ[[cp]rplaca[cp;cons[NIL;cp]];car[cp]]
∞stuff_up <= λ[[cp;j]prog[[temp]temp←cdr[cp];rplacd[cp;j];return[temp]]]
∞cnt <= λ[[x]cdr[x]]    
∞exp <= λ[[x]car[x]]

.END
The development and understanding of this algorithm required most of what we
have covered in the course. 
We used our knowledge of the parser, %3read%*; we used our familiarity with
with S-exprs stored as linked-lists; we have to understand the run-time control
of recursive calling sequences; we had to understand pointer mainpulation;
we have to understand pointer modification; and finally we have to be wickedly
clever.
With that understanding we
were  able to apply threading at a level higher than a "once-only" trick.
.CENT(Problems)
%21.%* Write a version of %3read%* which uses an explicit stack to remember where it
is in the parse.

%22.%* Write an extension to the evaluator to handle the extended λ-notation
advertised in the non-recursive reader.

.NEXT PAGE;
.CENT(A link-bending garbage collector for LISP)
.P144:
The use of a stack is one of the difficulties associated with garbage 
collection. Garbage collection is invoked when available space has become
exhausted, but here we are, asking for %2more%* space to use for stacking.
The usual solution is to allocate a separate area for stack storage. This 
has its drawbacks. If we don't allocate enough stack space, i.e., if the
depth of a piece of structure becomes too great, then the marker will fail.
The amount of stack space can become large; proportional to the
length of a list.
We can apply threading here,
modifying the structure as we traverse it; as usual the threads will be used as
control information. As we finish marking a branch we
restore the structure to its original topology.
Certainly the marker will be more complicated, but the complication is
not insurmountable. In fact, it is even reasonably straightforward to prove
the correctness of such an algorithm ⊗↓ W. deRoever, private communication.←.

  **** what goes here is example of link-bending gc

.SS(Dynamic allocation and LISP)
LISP's most general data object is a dotted pair; however we frequently
are found to be using dotted pairs to represent more structured objects.
For example, many common LISP programs involve list-operations on
list representations. But lists, we know, are representations of sequences.
From {yonss(P137)}
we now also know that arrays are efficient representations of sequences.
Indeed array representations are typically more efficient than the
general LISP linked-list.  If we could capitalize on this more 
efficient representation without jeopardizing the LISP operations, then we would
win.

An analysis of the LISP operations shows that we have to be able to %2share%*
substructures, and if using %3rplaca%* or %3rplacd%*, we have to be able to
%2modify%* existing structures. Any proposed economies in storage layout
must take cognizance of these facts. Fortunately these requirements are compatible.
Consider the typical  representation of the sequence:
.BEGIN CENTER;SELECT 3;

(LAMBDA(X)(F X Y))
.END

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
	⊂ααααααααπααα⊃   ⊂αααπααα⊃  ⊂αααπαα⊃  
	~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
	%αααααααα∀ααα$   %αβα∀ααα$  %αβα∀αα$ 
			   ~	      ↓
		⊂αααααααααα$     ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
		↓		 ~ F ~ #αβ→~ X ~ #αβ→~ Y ~≤'~
		⊂αααπαα⊃	 %ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
		~ X ~≤'~  
		%ααα∀αα$

.END

This takes seven words. The %3car%*-part  of each cell contains the information;
the %3cdr%*-part tells where the rest of the expression is to be found.
That is, we have dedicated 14 half-words to represent the structure; only seven
of which contain the actual information we wish to store. 
Using some extra encoding we can carry the same information in 
seven⊗↓slightly larger← cells.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααααα⊃
~↓ LAMBDA ~
εαααααααααλ     ⊂αααααααα⊃
~↓      #αβαααα→~/     X ~
εαααααααααλ     %αααααααα$
~/      #αβα⊃
%ααααααααα$ ↓   ⊂αααααααα⊃
	    %αα→~↓     F ~
		εααααααααλ
		~↓     X ~
		εααααααααλ
		~/     Y ~
		%αααααααα$

.END

The intent of the special characters  is to encode information about
the %2next%* cell in the representation. It thus is called %2⊗→%3cdr%*-coding↔←%1.
Thus %b↓%* means the next cell %2is%* the %3cdr%*; %b/%* means the %3cdr%* is 
%3NIL%*.

The typical LISP cell is a third variety of %3cdr%*-coding: the code %b→%* says
the next cell is a pointer to the %3cdr%*. With that, we introduce the final code:
%b*%* means this cell is the %3cdr%*-half of a LISP word.
Thus %3(A#B)%* could be expressed in any of the following forms:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααα⊃			⊂ααααα⊃
~↓  A ~			~→  A ~
εαααααλ			εαααααλ    ⊂ααααα⊃
~/  B ~			~*  #αβααα→~/  B ~
%ααααα$			%ααααα$    %ααααα$


⊂ααααα⊃			 	⊂αααααπααααα⊃    ⊂αααααπααααα⊃
~→  A ~				~  A  ~   #αβααα→~  B  ~  ≤' ~
εαααααλ   ⊂ααααα⊃		%ααααα∀ααααα$    %ααααα∀ααααα$
~*  #αβαα→~→  B ~
%ααααα$   εαααααλ
	  ~* NIL~
	  %ααααα$

.END
However this encoding scheme is not sufficient as it stands. Consider the 
following example: Given internal pointers %3x%* and %3y%* into:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ #αβαα∀αα→~ Y ~ #αβαα→~ Z ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.END

and assume we wish to perform %3rplacd[x;(A#B#C)]%*.
In our standard implementation we would arrive at:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ # ~  %αα→~ Y ~ #αβαα→~ Z ~≤'~
    %ααα∀αβα$      %ααα∀ααα$   %ααα∀αα$
          ↓
          ~ ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
          %→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
	    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$
.END

But if we assume %3(F X Y)%* is represented in its compact form
we have some troubles.
We can't replace the cell:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";

⊂ααααα⊃            ⊂αααααα⊃
~↓  X ~		by ~*   #αβ→ to ∞3(A B C)∞*
%ααααα$            %αααααα$
.END
since %3z%* would get justifiably upset. What we will do is use the
forwarding address scheme we introduced on {yon(P173)} in the compacting
garbage collector.
We put a forwarding address in the cell referenced by %3y%*;
then allocate a new pair of half-cells, putting %3F%* in the first and a pointer to
%3(A#B#C)%* in the second.
Now everyone is happy.

These forwarding addresses are an instance of %2⊗→invisible pointers↔←%*
proposed by R. Greenblatt in his LISP machine; he has also proposed
implementing %3cdr%*-coding in hardware.
Between invisible pointers and 
%3cdr%*-coding, we %2can%* effect all LISP operations using this
potentially more compact representation. Thus we %2are%* interested in compacting
garbage collectors in LISP for reasons related to fragmentation of space; we
may want to allocate variable-sized blocks in LISP.

***add discussion of garbage collection***
.SS(Hash linking)


.SS(Representations of complex data structures,,P138:)

do seqential allocation

 do sequences and records and dynamic allocation

 do machine organ: paging

dynamic allocation